August, 1946 


| Corpora ; Wattan V. Bincuam, A.G.0., War Department; 
eilaluriaye Aeruun I. Gates, T. C. Columbia University; 
orig, W IRVING ag veh T. C. Columbia University; 





nalysis of Aiother Point Rating Scale for 
y of an Abbreviated Scaie: 
. H, Lawsne, Jr., anv S. L. Avessi 
: II. Frequency Distributions and an 
x ion of Outpuf’: H. F. Rorue 
ion Between Scores . andard Tests and Supervisory Success 
i pha steht sti Sar 


Soe oe 


’ Saag at Purdue U p beige? a: E. Wirt. 


H. Guetzkow, ©. Micketsen, anp A. Krys 
and Strength: M. B. Fisner anp J. E. Brrren 





a aa Lemon Sts., Lancaster, Pa, and 
d Nebraska Aves., NW, Washington 16, D. C. 


ust 19, 1943, at the post office at Lancaster, Pa., under the Act of March 3, 1879 
1946, by The American Psychologica! Association, Inc. 








Journal of Applied Psychology 








Vol. 30, No. 4 August, 1946 








The Psychological Corporation’s Index of Public Opinion 


Henry C. Link 
The Psychological Corporation, New York City 


This survey of attitudes is the fourteenth in a series begun in February 
1937. It supplements the Psychological Barometer, a series begun in 
September 1932 and conducted quarterly with 10,000 personal inter- 
views,—the oldest poll of its kind in existence. 

The present study involved 5,000 personal interviews during April 
1946 by 479 interviewers in 125 cities and towns, and represents a true 
cross-section of the urban and small town population. Details as to 
questionnaires, socio-economic groups, sex, etc. are given at the end. 

This fourteenth survey deals with attitudes toward industrial and 
political issues in an election year as sampled in April 1946. 


Do Employers and Unions Have Equal Rights? 


The following two questions were asked to throw into bold relief the 
social responsibility of labor unions on the one hand and of businessmen 
on the other: 


Q. “Do you believe that workers and unions have the right to strike when wages 
and working conditions don’t suit them?” 

Q. “Do you believe that businessmen have a right to shut down their factories and 
stores when labor conditions and profits don’t suit them?” 


The right of workers to quit their job at any time has long been taken 
for granted, but in recent years this right has also come to be identified 
with organized labor unions, as many polls have shown. The right of 
owners and managers to shut down their plants as a step in dealing with 
labor unions has been widely questioned. When a large manufacturer 
recently closed his plants with the statement that he could no longer 
operate them under conflicting union and government controls, his action 
was sharply condemned. The answers to the above questions were: 

297 








298 Henry C. Link 


Have Workers and Unions a Right to Strike? 








Socio-Economic Groups 








Answers Total A B Cc D 
Yes 63.7% 65% 62% 62% 68% 
No 29.3 28 31 30 25 
Uncertain 7.0 7 7 8 7 

Total interviews 5,000 500 1,500 2,000 1,000 





By more than 2 to 1, people answered that workers and unions had 
the right to strike. While all income groups were about equal in answer- 
ing ‘‘yes” there were substantial numbers who answered “no.” Indeed, 
the largest proportion who said “no” was in the middle income groups. 
Even among union members, 20% answered “no” though 74% said 
“ves.”” When it comes to the right of employers to shut down their 
factories and stores, the results were quite different. 


Have Businessmen a Right to Shut Down? 











Socio-Economic Groups 








Answers Total A B C D 
Yes 49.5% 60% 57% 46% 41% 
No 43.6 32 36 47 53 
Uncertain 6.9 s 7 7 6 

Total interviews 5,000 500 1,500 2,000 1,000 





Evidently the sense of fair play or equality exercises some influence be- 
cause almost 50% answered “yes.” However, this percentage was not 
nearly so large in the C and D groups as in the A and B groups, whereas 
in the first question it was about the same in every income group. The 
results by union and non-union members were: 











Businessmen Union Non- 
Have Right Members Union 
Yes 43% 52% 
No 51 41 
Uncertain 6 7 





Even among union members, a large proportion concede to employers 
the right to shut down. The questions raised by these results are: What 
is the social responsibility of labor unions and businessmen? Do they, 
as unions or corporations under the law, have unlimited rights to strike? 
If not, what are the limitations? 





























Psychological Corporation’s Index of Public Opinion 299 


The Effect of Changing the Order of Two Questions 


Much has been written about the wording of questions but little 
about their order. The order of the above two questions was reversed 
in the two forms of the questionnaire. The differences were statistically 
significant, being four to five times the probable error of 1% for a sample 


of 2,500 interviews. However, the magnitude of the differences was 
slight: 








Where Question Where Question 





on Unions on Businessmen 

Came First Came First 
Workers have right to strike 65.9% 61.6% 
Workers do not have right 27.7 30.9 
Businessmen have right 52.3 46.7 
Businessmen do not have right 41.3 46.0 





67% Favor Stronger Laws to Regulate the Unions 


The large majority of the American public favor stronger laws to 
regulate the unions. If our survey had included the farm population, 
this majority would probably have been even larger. One aspect of this 
problem was touched on by the following question: 


Q. “If a candidate for Congress promised to support stronger laws to regulate the 
unions, would you vote for him or against him?” 








Socio-Economic Groups 








Answers Total A B Cc D 
Vote for him 66.7% 74% 70% - 65% 62% 
Vote against him 19.7 16 18 21 22 
Uncertain 13.6 10 12 14 16 

Total interviews 2,500 250 750 1,000 500 





Even more significant than the fact that the large majority favor stronger 
laws to regulate the unions is the finding that a large proportion of union 
members are also of this opinion. The answers by union members vs. 
non-union members were: 











Union Non- 

Answers Members Union 

Would vote for 60% 69% 
Would vote against 26 18 


Uncertain 14 13 











Henry C. Link 


The CIO—PAC Election Purge 


Since the CIO through its Political Action Committee has announced 
its purpose to purge a large number of political candidates, the following 
question has become especially timely: 


Q. “As you know the CIO unions through their Political Action Committee are 
trying to elect a lot of Congressmen. If you knew that a candidate for Congress was 
backed by the CIO, would you be more likely to vote for him or less likely?” 








Socio-Economic Groups 








Answers Total A B Cc D 
Less likely 59.4% 67% 67% 59% 45% 
More likely 16.0 10 11 14 31 
Neither or d.k. 24.6 23 22 27 24 

Total interviews 2,500 250 750 1,000 500 





The results by CIO, AFL, and non-union members are especially inter- 
esting: 











Other Non- 

Answers AFL CIO Unions Union 

Less likely 55% 28% 53% 64% 
More likely 20 47 20 12 
Neither or d.k. 25 25 27 24 





Attitudes toward Certain Types of Candidates 


Attitudes toward certain broad classifications of candidates were 
shown through the answers to the question: 


Q. “What kind of man would you be most likely to vote for in Congress: a good 
business executive; a good labor union leader; a good college professor; a good lawyer; 
a good politician?” 














Socio-Economic Groups 

Answers Total A B Cc D 
Business executive 49.2% 56% 57% 49% 34% 
Labor union leader 13.9 4 7 15 28 
Politician : 12.8 12 13 12 16 
Lawyer 11.0 14 10 ll 9 
College professor 5.2 6 6 5 4 
Don’t know 7.9 8 7 8 9 


Total interviews 2,500 250 750 1,000 500 
































Psychological Corporation’s Index of Public Opinion 301 


Businessmen who consider themselves vilified as a class may take some 
encouragement from these results. Even the union members prefer a 
good businessman to a good union leader: 











Union All 
Answers Members Others 
_ Business executive 34% 55% 
Labor union leader 28 9 
Politician 14 12 
Lawyer 9 12 
College professor 6 5 
Don’t know 9 : 7 
Total interviews 686 1814 





Two Groups of Election Issues 


Aside from specific questions on certain issues, a list of six possible 
issues was presented to one-half of our sample and another list to the 
other half. In respect to each item the person was asked whether he 
thought it “very important or not so important for Congress” to take 
action indicated. Then, after expressing himself (or herself) on the six 
items, he was asked which he considered most important. 











Very Not Most 

Issues Imp’t Imp’t D.K. Imp’t 

Reduce taxes 57% 3595 8% 14% 
Give bonus to veterans 73 23 4 30 
Pass laws to regulate unions 73 18 9 28 
Reduce Government control of business 53 32 15 15 
Lend 4 billion to England 29 58 13 6 
Lend 1 billion to Russia 22 65 13 1 





Loans to England and Russia are considered least important, while a 
bonus to veterans and regulating the unions are considered most import- 
ant. Reducing taxes is not a paramount issue. On the other half of the 
interviews, the results were: 











Very Not Most 
Issues Imp’t Imp’t D.K. Imp’t 
Cut down the number of Government 
employees 60% 29% 11% 5% 
Reduce the Government debt 77 15 8 12 
Strengthen the Army, Navy and Air 
Force 71 23 6 15 
Get housing for veterans and others 94 5 1 41 
Check Communism 73 19 8 13 
Pass laws against race discrimination in 
employment 53 37 10 10 








302 Henry C. Link 





All these issues except possibly the last were considered very important 
with the housing issue well in the lead. 
Communism in the United States 


In this survey we repeated a question which was started as a trend 
question in 1937. 


Q. “Do you believe the United States is on the way to Communism?” 














Feb. Oct. Oct. Oct. Apr. 
Answers 1937 1937 1939 1941 1946 
Yes 20% 14% 12% 13% 21% 
No 64 64 68 75 65 
Uncertain 16 22 20 12 14 





This shows a definite trend in convictions which is interesting in view 
of the current emphasis on the dangers of fascism. A similar question 
on fascism in the 1937 and 1941 studies showed 9% and 8% respectively 
answering ‘“‘yes.” The above results for April 1946 by union and non- 
union groups, are especially interesting: 











Union All 
Answers Members Others 
Yes, headed for Communism 19% 22% 
No, not headed for Communism 66 65 
Uncertain 15 13 
Total interviews 725 1,775 





In the other half of the interviews this problem was posed as follows: 


Q. “It is being said that Communism is becoming a dangerous thing in the United 
States. Do you think this is true or not?” 











Communism Becoming Union All 
Dangerous Total Members Others 
True 51.2% 55% 50% 
Not true 34.1 29 36 
Uncertain 14.7 16 14 
Total interviews 2,500 686 1,814 





It is noteworthy that union members, among whom Communists are 
reported to be most active, are slightly more fearful of Communism than 
the rest of the population. The answers by socio-economic groups are 
also revealing. 

















Psychological Corporation’s Index of Public Opinion 303 








Socio-Economic Groups 
Communism Becoming 








Dangerous A B C D 
True 43% 51% 55% 50% 
Not true 46 39 30 27 
Uncertain ll 10 15 23 
Total interviews 250 750 1,000 500 





The OPA and Government Controls 


Formal and informal polls made recently seem to indicate that an 
overwhelming majority of people favor continuation of the OPA. Un- 
doubtedly the OPA has successfully identified itself in the minds of the 
people as the one agency fighting against higher prices. Therefore, the 
people, when asked about the OPA, naturally favor its continuance. 
The simple question: “Are you in favor of continuing the OPA?” has 
become almost synonymous with the question: ‘Do you want prices kept 
down?” Whether or not the OPA does keep prices down is a matter for 
debate. When public opinion polls try to reduce a highly complicated 
and emotional situation to a simple “yes or “no” question they are on 
dangerous ground. In the survey reported here, two different questions 
were asked, each one with one-half of our sample of 5,000 people. One of 
these questions was: 


Q. “Do you think that the powers of Chester Bowles and the OPA should be in- 
creased or decreased?” 








Socio-Economic Groups 








Answers Total A B C D 
Decrease powers 31.6% 42% 35% . 30% 24% 
Increase powers 25.6 20 24 26 31 
Neither 28.6 31 30 30 23 
Uncertain 14.2 7 11 14 22 

Total interviews 2,500 250 750 1,000 500 





This question, while by no means perfect, allows for more than a 
simple all or nothing answer. It will be seen that a majority, 54%, want 
the OPA either continued as at present or with increased powers. The 
32% who want its powers decreased do not necessarily want the OPA 
abolished. What is especially interesting about these results is their 
uniformity by different socio-economic levels. Over 50% in every 
economic level want the OPA continued with equal or greater powers. 
The other half of our sample was asked: 








304 Henry C. Link 


Q. “What is doing the most to increase the cost of living: strikes and wage in- 
creases; businessmen trying to raise prices; government regulations and restrictions on 
prices and materials?” 








Socio-Economic Groups 








Answers Total A B C D 
Strikes and wage increases 43.1% 54% 44% 43% 37% 
Businessmen 27.3 23 24 30 30 
Government restrictions 26.7 31 28 26 24 
Other causes and d.k. 13.2 ll 15 12 15 
Total interviews 2,500 250 750 1,000 500 





These percentages add up to more than 100 because many people gave 
more than one answer. When these answers are divided by union mem- 
bers and non-union members we have the following results: 











Union Non- 
Answers Members Union 
Strikes and wage increases 33% 46% 
Businessmen 36 24 
Government restrictions 27 26 
Other causes and d.k. 12 13 





Both the union and non-union members, it will be noticed, attribute 
about the same degree of responsibility for higher prices to government 
regulations and restrictions. A higher percentage in both groups blame 
strikes and wage increases and next to these, businessmen trying to raise 
prices. 


Confidence in Government Declines 


Possibly the above results help to account for the decline in the 
people’s confidence in the Government’s reconversion efforts. Beginning 
in October 1941 the following question was asked at six month intervals: 


Q. “Who do you think can do the best job in straightening things out after the 
war: the Government in Washington; Business Leaders; Labor Union Leaders; or 
others?” 

(In April 1946 the wording was changed so that the question was as follows: “Now 
that the war is over who do you think can do the best job of straightening things out at 
home: the Government in Washington; Business Leaders; Labor Union Leaders; or 
others?’’) 














Psychological Corporation’s Index of Public Opinion 305 











Oct. Oct. Oct. Apr. 

Answers 1941 1943 1945 1946 

Government in Washington 47% 42% 51% 45% 
Business leaders 26 28 22 26 
Labor union leaders 5 8 9 6 
All three together 7 9 12 7 
Others or no opinion 17 17 11 16 
Total interviews 2,000 2,500 2,500 2,500 





While the highest percentage of people still believe in the leadership 
of the Government, it has dropped from a bare majority to 45%. Con- 
fidence in the leadership of union leaders dropped sharply while confidence 
in the leadership of businessmen had a corresponding rise. In view of 
these changes it is especially interesting to classify the answers by union 
and non-union members. 











Union Non- 
Answers Members Union 
Government in Washington 44% 46% 
Business leaders 20 28 
Labor union leaders 13 4 
All three together 7 9 
Others or no opinion 19 15 





The union members have more confidence in union leadership than do 
the non-union members. However, the percentage of union members 
who have confidence in business leadership is even higher than the per- 
centage of union members who have confidence in union leadership. 
With regard to government leadership there is little difference either as 
between union and non-union members or as between different economic 
groups. In respect to business leadership and labor leadership the 
confidence varies by different economic groups as follows: 








Have confidence in: 








Business Union 
Socio-Economic Groups Leaders Leaders 
A Owner class 36% 2% 
B_ White collar 30 3 
C_ Skilled industrial 25 6 
D_ Semi-skilled 17 13 











306 ; Henry C. Link 


People Think Themselves Less Prosperous 


Six months of costly strikes, sharp wage increases, many price in- 
creases, and a large increase in the number of returned veterans have had 
their effect. In view of the many arguments about wages, etc., it is 
particularly timely to know whether people think they are more prosper- 
our today than they were two years ago. People’s opinions about their 
prosperity were obtained in response to a question which has now been 
asked seven times since October 1941: 


Q. “Is your family more prosperous (or better off) today than two years ago, less 
prosperous, or the same?” 











Oct. Oct. Oct. Apr. 

Answers 1941 1943 1945 1946 

Better off 38% 29% 32% 26% 
The same 47 46 51 48 
Worse off 15 23 15 24 
Uncertain — 2 2 2 





Throughout the war years the large majority considered themselves 
more prosperous or at least as prosperous as two years earlier. This 
belief was fully borne out by statistics of the Department of Labor which 
showed, for this period, an increase of 35.5% in the hourly wage rates of 
industrial workers anda 73% increase in the total weekly pay, compared 
with a 30% increase in the cost of living. However, the April survey 
revealed a drop of 9% in those who considered themselves as prosperous 
or more prosperous as compared with the survey made six months 
earlier. This drop is reflected among socio-economic groups as follows: 








Socio-Economic Groups 











A B C D 
Answers 45 46 45 46 "45 46 "45 46 
Better off 32% 31% 31% 2% 29% 2% 39% 30% 
The same 50 49 53 47 53 50 47 43 
Worse off 16 18 15 28 15 24 12 24 
Uncertain 2 2 1 2 3 1 2 3 





Interestingly enough the answers of union members indicated that 
they were no better off or worse off than the non-union members: 














Psychological Corporation’s Index of Public Opinion 307 











Union Non- 

Answers Members Union 

Better off 26% 25% 
The same 46 49 
Worse off 26 24 
Uncertain 2 2 





Although the white collar and salaried workers represented by group 
B seemed to be feeling the pinch the most, the large majority even in this 
group still considers itself as prosperous or more prosperous than two 
years ago. 


Attitudes toward Five Pressure Groups 


Since highly organized pressure groups and lobbies are playing such a 
large part in the democratic process, the people’s attitude toward such 
groups becomes increasingly significant. It has been claimed that the 
most powerful groups in Washington today are the Farm lobby and 
organized labor. However, the former is not known to the public and 
the latter is represented by two or more distinct organizations. There- 
fore we asked: 


Q. “Which of the following organizations do you think well of and which not so 
well of?” 











Not so 
Answers Well Well Doubtful 
The U.S. Chamber of Commerce 65% 11% 2% 
The AFL 50 31 19 
The CIO 26 56 18 
Natl. Assn. of Mfrs. 37 17 46 
The American Legion 77 15 8 





Those who were “doubtful” in some cases had no definite attitude 
and in others did not know the organization. The contrast between the 
attitudes toward the AFL and CIO is noteworthy. It becomes even 
sharper when broken down by CIO and AFL members. The good will 
of the American Legion is outstanding. 


The Probability of Another War 


In view of the tremendous interest in peace and measures for a per- 
manent peace, we repeated a question which we asked first in a depth 











308 Henry C. Link 


study in February 1943 (Link, H. C., An experiment in depth interview- 
ing’on the issue of Internationalism vs. Isolationism. Pub. Opin. Quart., 
1943, 6, 267-279). The question was as follows: 


Q. “After this war (or, now that the war is over) do you think that we will make a 
peace settlement that will last, or do you think that we will have another world war in 
twenty-five years or so?” 





Feb. Oct. os Oct. foe. 





Answers 1943 1944 1 1945 1 
Will have another war 43% 54% 51% 59% 62% 
Will make a lasting peace 47 28 33 28 24 
Don’t know 10 18 16 13 14 





Q. “Who do you think will be our next enemy?” 





Answers by Those Who Said There Would be Another War 
Oct. ‘—. Oct. jo, 
Country Named 1944 1 1945 1 





Russia 29% 27% 37% 45% 
Germany Q 6 2 2 
Japan 5 3 5 1 
England 4 4 3 4 
China 1 1 1 1 
Don’t know 6 10 11 i) 
Total 54 51 59 62 





This reflects a steady and sharp increase in the % who expect another 
war, except for the April 1945 period which reflected the result of the San 
Francisco Conference. There is a sharp increase in those who believe 
that the next war will be with Russia. Of the 62% who anticipate 
another war, about 72% name Russia as the next foe. 


Explanation of the Survey 


This survey was made during the first three weeks in April with 5,000 
personal interviews in 125 cities and towns representing a cross-section of 
the urban and small town population. Two questionnaires were used, 
each with one-half the sample, so that some questions were asked of 
5,000 people and others of only 2,500. The number of interviews for 
each question is given in the tables. All interviews were made in the 
home, but only one in a family. Half were made with women, half with 


men. 
The interviews were distributed by four socio-economic groups re- 
ferred to in the previous tables as A, B,C, and D. This distribution was 























Psychological Corporation’s Index of Public Opinion 309 


made in accordance with the socio-economic maps in each locality ac- 
cording to which the local supervising psychologist assigned the calls to 
be made’ by streets and blocks. The great differences between the 
thinking of these various socio-economic groups are shown in some of the 
tables. These differences, incidentally, are also an indication of the 
thoroughness with which these interviews have been distributed by 
socio-economic levels. 


Received May 23, 1946. 








Studies in Job Evaluation: IV. Analysis of Another Point 
Rating Scale for Hourly-Paid Jobs and the 
Adequacy of an Abbreviated Scale 


C. H. Lawshe, Jr., and Salvatore L. Alessi 
Division of Applied Psychology, Purdue University 


Previous studies in this series have analyzed the point rating system 
of job evaluation adapted by Kress! for use by the National Electrical 
Manufacturer’s Association as applied both to hourly-paid and salary- 
paid jobs in industry, and abbreviations of this same rating scale have 
been examined and compared with the original system. These studies 
have identified the same or similar two or three factors which function in 
this rating system, and the abbreviations of the system have yielded re- 
sults practically identical to those obtained by the complete scale.” ** 

The present study includes a factor analysis of another point rating 
scale * for hourly-paid jobs in industry and an investigation of an ab- 
breviation of this system. This system differs from the NEMA plan in 
that each job is rated by means of more and finer categories or degrees on 
each of the factors, and that the point ratings are translated into “rating 
factors’ by a logarithmic conversion chart for purposes of assigning 
monitary equivalents. The jobs, then, are paid not by labor grades but 
by finely graduated “rating factors” which in turn are multiplied by the 
common labor wage rate in the community to yield the hourly wage rate 
for any given job. 

The primary purpose of the study reported here was to analyze 
statistically’ this particular point rating system as used in an industrial 
plant. An attempt is made also, to relate the basic factors operating in 


1 Kress, A. L., How to rate jobs and men. Factory Management, 1939, pp. 60-65. 

? Lawshe, C. H., Jr., and Satter, G. A., Studies in job evaluation: I. Factor analyses 
of point ratings for houriy-paid workers in three industrial plants. J. appl. Psychol., 
1944, 28, 189-198. 

* Lawshe, C. H., Jr., Studies in job evaluation: II. The adequacy of abbreviated 
point ratings for hourly-paid jobs in three industrial plants. J. appl. Psychol., 1945, 
29, 177-184. 

4 Lawshe, C. H., Jr., and Maleski, A. H., Studies in job evaluation: III. An analysis 
of point ratings for salary-paid jobs in an industrial plant. J. appl. Psychol., 1946, 30, 
117-128. 

5 The names of the authors of the scale and a specific description are withheld at the 
request of the company concerned. 


310 











Studies in Job Evaluation: IV 311 


this system to those item clusters or factors found to be operating in the 
NEMA system and its modifications previously reported. Finally, the 
validity of an abbreviated point rating scale based on a few best items 
was investigated to determine the extent of differences between the two 
rating scales and their practical significance. 


Procedure 


Nature of Plan. The plan calls for the rating of industrial occupations 
on seven so-called elements: General Schooling; Training Period; Manual 
Skill; Versatility; Job Knowledge; Responsibility; and Working Condi- 
tions. Each job is rated on the first six of these items or elements by 
means of 10 classes or degrees which vary in point values from element to 
element. Further, the point differentials are not necessarily equal from 
degree to degree within the same element. Finally, the element “work- 
ing conditions” is really 3 elements combined. The rater is required to 
rate each job on three aspects of “working conditions”; namely, “sur- 
rounding conditions,” “minor hazards,” and “major hazards,” and each 
of these is divided into 3 classes or degrees ranging from “normal” thru 
“poor” to “very poor.” Again, the degrees are weighted differently in 
each of these three working conditions scales. The point rating for 
“working conditions” is the total of the three sub-ratings. 

The point rating of each job is the total of the point ratings on each 
element. These total point ratings are translated into a “rating factor” 
by means of a “conversion chart” which consists of two scales and a 
curve. The horizontal scale is arithmetic and represents the total point 
rating values and the vertical scale is logarithmic and represents the 
“rating factor.”” By locating the point rating of a specific job on the 
horizontal scale and following this to the point of intersection with the 
curve, the “rating factor’ for that job can be read at the opposite point 
on the vertical scale. The basic wage rate for the job is then determined 
by multiplying this “rating factor” or index by common labor wage rate 
in the community or region. 

Source of Data. Point rating data were obtained from an industrial 
plant having more than 100 different job classifications. These jobs 
ranged from foremen to plant laborers. 

Procedure. Intercorrelations between the point ratings of each of 
the seven elements and the total point ratings were computed and a cor- 
relation matrix was prepared (see Table 1). This matrix was subjected 
to Thurstone’s centroid method of factor analysis while the “rotation of 
axes” followed Peters and Van Voohris’ technique.‘ 


* Peters, C. C., and Van Voohris, W. R., Statistical procedures and their mathematical 
bases. New York: McGraw-Hill Book Co., 1940, pp. 248-278. 











312 C. H. Lawshe, Jr., and Salvatore L. Alessi 


Table 1 


Intercorrelations of Point Ratings of Each of Seven Items and Total Points in a Job 
Evaluation System in an Industrial Plant 











| & 2 
- | sw 
Rating Scale a #3 rhe q 3 ig 
Items 3 a 25 q é ‘2 E 
B&B Oo aie S KS 
< @ § S &.:e. - % 
(B) General Schooling .660 
(C) Training Period .866 .496 
(D) Manual Skill .916 .646 .890 
(E) Versatility .838 541 .692 -736 
(F) Job Knowledge .883 .650 721 .774 -786 


(G) Responsibility .919 646 .776 -796 -762 871 
(H) Working Conditions -—.013 —.269 -—.201 —.177 .000 —.149 —.176 





The Wherry-Doolittle shrinkage selection method as described by 
Stead, Shartle, et al 7 was applied to select items for an abbreviated scale. 


Factor Analysis Results 


Factor Names. The centroid loadings of the factors as they were 
derived from the analysis are presented in Tables 2 and 3. Factor I was 
found to be common to all the elements except “working conditions’’ in 
comparable magnitude. The loadings for these elements are all within 
the range of .795 for “manual skill” to .720 for “‘training period.” The 


Table 2 
Factor Loadings Before and After Rotation 

















Before Rotation After Rotation 

Rating Scale Items ky ke ks h2 ky ke ks h? 
(A) Total Points 951 .249 —.173 .996 901 .405 .145 .997 
(B) General Schooling .722 —.123 —.268 .608 .730 .000 .000  .608 
(C) Training Period 875 —.120 .317  .881 .720 028 .601 .880 
(D) Manual Skill 925 —.145 .246 .937 .795 012 .552 .937 
(E) Versatility 821 .310 —.104  .781 .746 A443 162 | .779 
(F) Job Knowledge .904 .197 .150 .879 754 846 435  .877 
(G) Responsibility 927 .153 .130 .900 .789 .307 .427 = .899 
(H) Working Conditions —.198 —.410 —.213 .253 —.045 —.4388 —.243 .253 





7 Stead, W. H., Shartle, C. L., et al. Occupational counseling techniques. New York: 
American Book Co., 1940, pp. 245-252. 














Studies in Job Evaluation: IV 313 


elements are listed in Table 3 in order of magnitude of loadings. Since 
each of these heavily loaded elements seems to pertain to some general 
skill required of the individuals who perform the job successfully the name 
given to this factor was “Skill Demands (General).” 

Factor II has its heaviest loadings in “‘versatility’”’ (.443) and “‘work- 
ing conditions” (—.438). This factor was named ‘Job Characteristics” 
since the two elements represent aspects in the job over which the worker 
has little control and for which he can do little by way of specific training. 
While the term “versatility” implies something about the individual on 
the job, actually the scale seems to pertain to the job itself since the 
emphasis is almost entirely on the routine or repetetive nature of the job. 


Table 3 


Three Factors Named with Rating Scale Items Arranged in Order of 
Magnitude of Loadings 











Factor Rating Scale Item Loading 
I. Skill Demands (D) Manual Skill .795 
(General) (G) Responsibility .789 
(B) General Schooling .780 
(F) Job Knowledge .754 
(E) Versatility .746 
(C) Training Period .720 
II. Job Characteristics (E) Versatility 443 
(H) Working Conditions — 438 
III. Skill Demands (C) Training Period .601 
(Specific) (D) Manual Skill 552 
(F) Job Knowledge 435 
(G) Responsibility 427 





That the element “working conditions” is negatively loaded with Factor 
II is plausible; the element is none the less representative of the factor and 
its influence is exerted in the negative direction.* There is no question 
about the element “working conditions” representing an aspect of the 
job over which the worker has no control. 

Factor III has its heaviest loadings in “training period” (.601) and 
“manual skill” (.552). “Job knowledge” and “responsibility”? elements 
are also loaded appreciably with this third factor. These elements, 
expecially the first two, seem also to represent a skill demand but it is a 
more specific type of skill requiring a higher degree of specialized training. 
Therefore, the factor has been named “Skill Demands (Specific)” ac- 
cordingly. 


8 Peters, C. C., and Van Voohris, W. R.., op. cit., pp. 270-272. 








314 C. H. Lawshe, Jr., and Salvatore L. Alessi 


The only elements not contributing to both of the “Skill Demand” 
factors are “general schooling,” “versatility,” and “working conditions.” 
“General schooling” is representative of only “Skill Demands (General)”’ 
and “versatility” is identified with the “Job Characteristics” factor as 
well as the “Skill Demands (General)’’ as discussed above. The “work- 
ing condition” element is significantly loaded with Factor II, “job char- 
acteristics” only. It is logically an isolated element, particularly since 
it is shown in the correlation matrix to be negatively correlated with the 


2.J0B CHARACTERISTICS . SKIL MAN 
(SPECIFIC) 


1. SKILL DEMANDS 
(GENERAL) 


82% 





Fia. 1. The relative proportion which each of the factors contributes to the total point 
ratings of hourly paid jobs in an industrial plant. 


total point ratings and the other elements (see Table 1), and because the 
remaining elements contribute to both of the “Skill Demands’”’ factors. 

Factor Significance. Assuming the rotation of axes (Table 2) to be 
the best possible, the relative proportions that each factor cortributes 
to the total variability may be estimated by squaring and adding the 
three loadings for the total point rating. Since the commuality (h’) of 
the three factors for the total point rating element is .996, practically all 
the variability in the total point rating is shown to be accounted for. 

The proportions are represented graphically in Figure 1. Factor I 




















AE 





























Studies in Job Evaluation: IV 315 


“Skill Demands (General)” accounts for 82% of the total variability, 
Factor II “Job Characteristics” 16%, and Factor III “Skill Demands 
(Specific)” 2%. These proportions indicate the relative extent to which 
the various factors contribute to the total variability of the total point 
rating and, in consequence, to the determination of the wage structure, 
since it is based on the total point rating for each job. 

Similarity Between Systems. Although the point rating system ana- 
lyzed in this study differed from the NEMA system analyzed in previous 
studies of this series, the basic factors found to be functioning in both 
systems seem to be comparable. The “Skill Demands” factor found in 
all previous studies to account for most of the variability in the total 
point rating has its counterpart in this system in the “Skill Demands 
(General)” factor which accounts for most of the variability in the total 
point ratings. The “Job Characteristics” factor ranked second in con- 
tributing to the total point rating in both systems again. And, finally a 
third factor which seems to embrace any specific skill demands, visual, 
supervisory, or specialized training, accounted for the reamining 2 or 3% 
variability in the total point ratings of both the NEMA system and of 
this system. 


The Abbreviated Rated Scale 


Elements Selected. The Wherry-Doolittle selection method was 
applied and three elements were selected for an abbreviated scale. The 
three elements are “responsibility,” “manual skill,” and “working con- 
ditions.” These three elements properly weighted correlated with the 
criterion total point rating .983. If the Wherry-Doolittle process had 
been carried further to include a fourth element, the multiple correlation 
coefficient would have increased by only .005. 

The one element which correlates highest with “total points’’ is 
“responsibility,” the coefficient being .919 (see Table 4). When the 
“manual skill” element is added the multiple correlation coefficient is 


Table 4 


Correlation Coefficients Between Ratings on Selected Scale Items and Total Point 
Ratings with Standard Errors of Estimate 








Standard Errors 





Correlation of Estimate Percent- 
Selected Items Coefficients (Total Points) age 
Responsibility 919 52.3 38.6 
Responsibility plus Manual Skill .968 33.3 25.1 


Responsibility plus Manual Skill 
plus Working Conditions .983 25.0 18.8 











316 C. H. Lawshe, Jr., and Salvatore L. Alessi 


increased to .968; and when “working conditions” is added the multiple 
correlation coefficient increases to .983. 

Accuracy of Prediction. Table 4 includes the standard errors of 
estimate for predicting the total point rating from either one, two or three 
items in an abbreviated scale. If only one item were used to predict 
total point ratings, the estimates for approximately two-thirds of the 

























































































140 
135 t s 
© . 
130 
2 
z 125 m2 
= ° . 
t (20 * 
w +. “ a ée 
w tts 
w 7” hod 
¢ ; ° = ee 
te 110 i oe 
' 7 . 
° °.° ~ 
M 105 « 
. ° 
a oo 
= 100 ° 
. 
> 
4 e ee. ee 
z 95 — 
> e ee 
Sn ool 3] ° 
we | 
ong 
85 2 
* 
co 
eso 
ee ¢ of @ 
e0 we. © 
7 ee 
e ¢e 
% eco ° ™ 
75 
. 
70 





70 75 80 85 90 95 100 0S 0 WS 120 125 130 135 140 
HOURLY WAGES - SEVEN ELEMENTS 


Fic. 2. Scattergram showing relationship between hourly rates based upon total points 
(seven items) and rates computed from three items. 


jobs would be within 52.3 points of the total point ratings based on all 7 
elements. If the best three items were used, the estimated total point 
ratings for approximately two-thirds of the jobs would be within 25.0 
points of the total point ratings based on all seven elements. The per- 
centage figures in Table 4 for each of the selected items indicate the 
proportional size of the errors in terms of the standard deviations of the 
total,point distribution. 














Studies in Job Evaluation: IV 317 


Coefficient of Multiple Determination. The coefficient of multiple 
determination is .964. Hence, it can be assumed that the three elements, 
“responsibility,” ‘‘manual skill’ and “working conditions,” in an ab- 
breviated rating scale contribute or account for 96.4% of the total vari- 
ation in total points while the other elements account for the remaining 
3.6%. 

Application of Abbreviated Scale. In order to test the practical appli- 
cation of the three element abbreviated scale, the changes that would 
occur if only the three elements were used were analyzed and the extent to 
which comparable results would be obtained was examined. For this 
purpose the regression equation for predicting “total points” from the 
“responsibility,” “manual skill’ and “working conditions’ was used. 
The equation is: 


Xrp = 43.0 + 1.7 Xr =} 2.7 Xs a 8 Xwe 


Point ratings on “responsibility,” and ‘‘manual skill” and ‘‘working con- 
ditions” were substituted in the regression equation formula for each of 
the jobs in the plant to obtain the computed ratings. These computed 
ratings and the total point rating for all seven elements of the original 
scale were translated into “rating factor’ indices and these were multi- 
plied by the going hourly-paid wage rate ($.71) for common labor in that 
vicinity to yield hourly-paid rates for each job. The money values (in 
cents) predicted by the abbreviated scales were then plotted against 
money values derived from the application of the original seven element 
rating scale in the scattergram in Figure 2. The grid lines in the scatter- 
gram indicate five-cent intervals in hourly-paid rates. 

Wage Differences. Table 5 summarizes the differences in “predicted” 
and “actual” wage rates paid for the jobs in the plant according to the 


Table 5 


Differences in Wage Rates as Computed with All Seven Elements and the 
Three Selected Elements 











Difference No. of Cumulative Cumulative 
in Cents Jobs Frequencies Percentages 
7 3 122 100 
6 5 119 97 
5 6 114 94 
4 13 108 89 
3 19 95 78 
2 30 76 62 
1 32 46 38 
0 14 14 11 


Mean Difference = 2.3 cents. 














318 C. H. Lawshe, Jr., and Salvatore L. Alessi 


abbreviated and original rating systems. The greatest difference be- 
tween “predicted” and “actual” wage rates is seven cents, and only 3 jobs 
showed this large a deviation in wage rates as determined under the two 
systems. Ninety-four per cent of the jobs were paid within a five cents 
differential and the average difference is only 2.3 cents. 

Since about half of the time a wage rate paid by the abbreviated 
rating scale would ‘“‘miss” the ‘“‘actual’” wage rate paid by less than three 
cents, it seems safe to assume that for practical purposes the abbreviated 
scale could be substituted for the original. This is significant when it is 
realized that a certain rate may be paid at one or another time to em- 
ployees on jobs ranging thru several “rating factor” indices, because of 
seniority in the plant, length of service on the job, or for other reasons 
that company policy may deem desirable or reasonable. The relative 
unreliability of point rating scales further minimizes the significance of 
the slight differences in results yielded by the abbreviated and original 
scales. In terms of practical operation, the two systems seem almost 


Summary and Conclusions 


The operation of a point rating system of job evaluation in an indus- 
trial plant was examined statistically by means of Thurstone’s centroid 
method of factor analysis. By means of the Wherry-Doolittle technique 
an abbreviation of the scale was set up and compared with the original 
scale. The following findings are supported: 


1. There are three primary factors operating and they jointly account 
for practically all (96%) of the variability in total point rating. 

2. The factor which contributes most to the variance in “total points” 
is the “Skill Demands (General)” (82%). This factor represents a 
general skill requirement for the individual who can do the job success- 
fully. 

3. The second factor which contributes 16% to the total variance in 
“total points” is “Job Characteristics.” This was so named because it 
includes certain aspects of the job with which the worker must contend 
and over which he has little control. 

4. The third factor is “Skill Demands (Specific)” and accounts for 
2% of the variation in “total points.” This factor represents the skill 
demands of the job of a more specialized nature. 

5. These results support the finding of previous studies in that the 
basic factors found operating in the NEMA system were also revealed in 
this plan. 


* Lawshe, C. H., Jr., op. cit. 














Studies in Job Evaluation: IV 319 


6. The abbreviated scale was made up of the “responsibility,” 
“manual skill” and “working conditions” elements. Ratings on selected 
elements have a multiple R of .983 with the total point ratings. 

7. If the abbreviated scale were used in this plant, only three of the 
jobs would be displaced as much as seven cents, 94% of the jobs would 
not be displaced more than five cents, and about half the jobs would not 
be displaced more than 3 cents in the wage structure of the plant. 

8. The practical significance of the slight differences are further 
minimized by the flexibility of the wage system and the probable un- 
reliability of the ratings. 

9. The abbreviated scale would yield results practically identical to 
those obtained with the original scale and would greatly reduce the time 
required as well as the complexity of the rating process. 


Received April 20, 1946. 











Output Rates among Butter Wrappers: II. Frequency 
Distributions and an Hypothesis Regarding 
the “Restriction of Output’ * 


Harold F. Rothe 
Stevenson, Jordan and Harrison, Inc., Chicago, [Uinois 


In an earlier paper on the analysis of output rates among butter 
wrappers (9), some of the problems involved in the use of work curves 
were discussed. The methodology of this study and the conditions under 
which the data were obtained were described in detail. The purpose of 
the present paper is to describe a further analysis of those data and to 
relate them to other industrial problems. 

There appears to be a constantly increasing appreciation of the 
magnitude of the range of individual differences in abilities and in output 
rates by industrial managers. Several writers have described distribu- 
tions of the output rates of industrial workers. For the most part these 
are bell-shaped distributions; many of them appear to be skewed, and no 
one appears to have tested these distributions for normality. According 
to Evans (3), the ratio between the outputs of the greatest and the least 
producers or the “‘best’”’ and the “worst”? workers on any job runs about 
3 or 4 to 1 for those operations where the pace is set by the workers rather 
than by the machines. That is, the “best” worker produces 3 or 4 times 
as much as does the “worst’’ worker in a given period of time. To this 
statement by Evans should be added the further qualification,—when 
the operators are approximately equally experienced. Ratios of this 
kind for light manual operations have been published by Hull for heel 
trimmers, 1.4 to 1, and for bottom scourers, 2 to 1 (7, 35). Hull also 
reported a ratio of 5 to 1 for spoon polishers, citing a report by Farmer 
and Brooke. Inspection of the original data, however, reveals that this 
distribution included both “experienced” and ‘“semi-experienced”’ 
operators, and that when the semi-experienced group was given one 
week’s training the ratio dropped to 2 to 1 (4). Tiffin presented polygons 
for electrical fixture assemblers and persons burning, twisting, and 

* This study was made in partial fulfillment of the requirements for the degree Ph.D. 
in psychology at the University of Minnesota. The writer wishes to thank his co- 
chairmen, Professors Donald G. Paterson and Miles A. Tinker, for their many helpful 
suggestions. He is also grateful to Mr. John Brandt, President, and the employees of 
the Minneapolis plant, Land O’Lakes Creameries, Inc., whose cooperation made possible 
this investigation. 











Output Rates among Butter Wrappers: II 321 


soldering the ends of insulated wire. The ratios of these groups appear 
to be approximately 2.5 to 1 (12, 4 ff.). Stead and Shartle have pub- 
lished output histograms with ratios of 4 to 1 for card-punch operators, 
1.5 to 1 for experienced lamp-shade sewers, and 3 to 1 for inexperienced 
lamp-shade sewers (11, 75 ff.). 

A few writers have discussed frequency distributions in connection 
with “restriction of output.” Ford hypothecated that a negatively 
skewed distribution was “‘a fairly certain index of organized restriction” 
(5, 78 ff.). Yoder later presented a similar hypothesis that the distribu- 
tion of output data would be skewed negatively and have a lower average 
and less variation if production were restricted than if it were unrestricted 
(14, 282 ff.). Neither of these writers presented evidence to support 
their hypotheses. Bliss found the distribution of earnings for a group of 
bench workers to be positively skewed, and he attributed this skewness 
to a lack of motivation (2). 

Bedford attacked this problem in a different manner, making one 
histogram for each individual shoe factory worker studied (1). These 
distributions, based on factory records over a 20-24 week period, showed 
a tendency toward a common modal rate of production for both fast and 
slow producers, with the fast ones being positively skewed and the slow 
ones negatively skewed. He also found less variation about their own 
means for the fast individuals. He proposed using these two phenomena, 
common mode and differential skewness for both groups, as an objective 
measure of “restriction of output.” 


The Present Problems 


Primary interest in the present investigation was in the work curves 
and their stability. It appeared worthwhile, however, to analyze the 
obtained data in terms of their frequency distributions, to add these 
distributions to the too-small number of such published distributions, 
and to attempt to relate these data to practical industiral problems. It 
was hoped to test the hypothesis of Ford and Yoder, and the contradictory 
explanation of Bliss, with the data. The smallness of the present sample 
of operators (eight women) made this impossible. 

The problems that were investigated may be listed briefly in this 
manner: (1) to obtain individual and group frequency distributions of 
output rates for industrial operators; (2) to test these distributions for 
normality; and (3) to investigate any possible phenomena about these 
distributions in relation to a measurement of “restriction of output.” 


“Restriction of Output”? Redefined 


It is well at this point to indicate more clearly what is meant by the 
term “restriction of output.” This term is generally used to indicate 




































322 Harold F. Rothe 


that workers are producing at a rate lower than the rate they are capable 
of maintaining over a long period of time without suffering any ill effects. 
Ford wrote in 1931 that restriction was probably present in over 90% of 
the major industries of the United States (5, 78 ff.), and Mathewson 
wrote that restriction among organized workers is believed so common as 
to be almost universal and therefore not requiring detailed inquiry to 
ascertain its presence (8, 4 ff.). 

The present writer believes the term is ill-advised because of its one- 

sided emotional connotation. That is, the term as it is now commonly 
used suggests that the reasons for this phenomenon lie wholly within the 
activities of the workers and outside of the actions of management. 
Experience in this war has shown that a large number of industrial 
phenomena are functions of both management and labor, and that 
nothing is to be gained by blaming, even by suggestion, one or the other 
i of the two parties. 
i From a psychological point of view, when workers restrict their out- 
put they feel that they have more to gain by producing below their 
optimum than they have from producing at their optimum. The 
problem is basically one of incentives. If the incentive is great enough, 
the workers will work at an optimum rate, that is, producing as much as 
they can, steadily, over a long period of time, without endangering their 
health or decreasing their away-from-the-plant activities. If the in- 
centive is not great enough they will work below this optimum, that is, 
they will “restrict” their output. 

The problem of incentives is a problem for both management and 
labor, and also for industrial scientists, to solve. If the term “restriction 
of output” is replaced by the term “ineffectiveness of incentives,’ joint 
action by all parties may be achieved more easily. That more desirable 
term is used from this point on in the present paper. 





Results 


Frequency distributions were made for each operator showing the 
various rates of output they made at any time during the two week 
period, with the exception of the first day of the study.! The data for 
this day were higher than for any other day and were therefore excluded 
from all analyses as reflecting a spurious phenomenon. ‘These distribu- 
tions of output data are presented in Table 1. 












1In Part I of this report the only data used covered five days. The data used here 
cover the two-week period. Alternating job assignments for the operators forbade the 
construction of usable work curves covering more than 5 days, but all the data were 
used in these distributions regardless of alternating duties. 


Output Rates among Butter Wrappers: II 323 


Table 1 


Frequency of Occurrence of Output Rates for each Operator, in terms of Pounds of 
Butter Wrapped in Fifteen-minute Periods, Grouped into 
Class Intervals of Three Pounds 








Operator Number 
4 5 





_ 
Ww 
oa 





55 
54 
62 
57 
66 
82 
169 
276 
270 
161 
66 
15 


184 151 1361 


m CO 


lamemSS58n-! 
LL LL lL moi BEB! 1 | | 


202 


5 
8 
34 
68 
54 
19 
6 
94 


1 


137 152 


— 
~J 
o 





All of the distributions in Table 1 were tested for normality according 
to the Chi Squared test, except the data for the whole group which were 
tested with class intervals of 5 rather than of 3 as shown here. At the 
1% level of significance, none of the individual distributions differed 
significantly from normal. This was also true when the eight Chi 
Squares and their respective degrees of freedom were added and the test 
was in terms of a normal deviate with a standard error of 1. Using the 
method of beta coefficients to test for skewness and kurtosis, the dis- 
tributions for Operators 1, 2, and 7 were significantly leptokurtic. This 
apparent discrepancy between the Chi Square and the beta coefficient 
tests for normality may be attributed to the use of the 1% level in the 
Chi Squared test,—a procedure that tended to make it difficult for the 
distributions not to be normal. 

The distribution of output rates for the whole group, shown in Table 
1, tested as significantly skewed in the positive direction. It should be 
noted clearly, however, that although this distribution contains hundreds 
of readings, it really refers to a distribution of eight cases only. The 
influence of Operators 1 and 6 account for the non-normality of this 
distribution. That is, two operators produced at a sufficiently faster 








324 Harold F. Rothe 


rate than did the other six operators so that the distribution of group 
data was positively skewed. 

In a manner similar to the above, distributions were also made for | 
each operator and for the group on another one of the operations they 
performed during the two week period. This operation, briefly, con- 
sisted of wrapping one-pound blocks of butter in a manner essentially 
similar to the manner of wrapping quarter-pound blocks. In that opera- 
tion, the distribution for all operators tested as normal according to the 
Chi Squared and the beta coefficient methods, except the distribution 
of Operator 4, which was non-normal at the 1% level on the Chi Squared 
test.” 

In summary, it appears proper.to say that these distributions are 
bell-shaped and tend to approximate the normal frequency distribution. 
This conclusion must be limited to these data and a more general state- 
ment cannot be made at this time. 

In connection with Bedford’s hypothesis of measuring “restriction”’ 
it is noteworthy that although six of the eight operators have fairly close 
means, there is no common mode for both fast and slow producers, nor 
are the distributions for the fast workers skewed positively and those for 
the slow workers skewed negatively. 

The ranges of inter-individual differences and intra-individual 


differences were obtained for both wrapping operations that have been 
described. These data are summarized in Tables 2 and 3. 


Table 2 


Data on Inter- and Intra-individual Differences in Operation of Wrapping Four 
Quarter-pound Blocks of Butter in Fifteen-minute Periods 





Operator Number 
1 2 3 4 5 6 7 8 


51.5 32.2 32.4 32.3 29.9 44.0 30.8 30.5 
41. 43. 40. 41. 52. 56. 40. 
. 21, 22. 20. 20. 32. 21. 23. 
1.78 1.95 2.41 2.00 2.05 1.63 2.67 1.74 


* Ratio of each operator’s fastest to her slowest rate. 











The ratios of fastest to slowest rate were used as measures of the 
ranges of intra-individual differences. The ratios between the mean 
rates of the fastest operator and the mean rates of the slowest operator 


? In testing the distributions for this second operation, corrections were made for the 
small number of cases. Yates’ correction for continuity was used in the Chi Squared 
test (6, 102), and kurtosis was tested by obtaining the standard errors of the g coefficients 
(6, 29). 





Output Rates among Butter Wrappers: II 325 


were used as measures of the range of inter-individual differences in this 
situation. These latter ratios were 1.69 and 1.77, respectively, for the 
data in Tables 2 and 3. These latter two ratios were surprisingly alike, 
considering the small sample used in this investigation, and led to the 
adoption of the single ratio 1.73 to 1 as expressing the range of inter- 
individual differences here. 


Table 3 


Data on Inter- and Intra-individual Differences in Operation of Wrapping One-pound 
Blocks of Butter in Fifteen-minute Periods 








Operator Number 
2 3 4 5 6 7 








62.4 58.8 50.8 48.5 72.6 61.2 

78. 80. 92. 76. 91. 88. 

48. 30. 32. 32. 52. 34. 
1.63 2.67 2.88 2.38 1.75 2.59 





* Ratio of each operator’s fastest to her slowest rate. 


From Tables 2 and 3 it is seen that only one ratio of intra-individual 
differences in Tables 2 and two ratios in Table 3 were lower than the 
ratio of inter-individual differences. 

A few other analyses were made of these data and the results of these 
are summarized below.* Analyzing the distribution of production rates 
by days for each operator revealed that: (1) the six slower operators 
tended to have a common mode for any one day; (2) all operators tended 
to have a common mode for themselves from day to day (note the 
straight-line work curves previously described); (3) all operators tended 
to have a common mean for themselves from day to day; (4) all operators 
tended to have a constant relative variation from day to day; and (5) the 
faster operators showed relatively less variation in production rates from 
day to day than did the slower operators. All of these analyses, of 
course, were based on a small amount of data and can only be interpreted 
as suggesting trends. 


An Hypothesis regarding the Measurement of 
the Effectiveness of Incentives 


To the extent that the present limited data permitted, it was desired 
to investigate the plausibility of the various hypotheses concerning the 


* These analyses were made by inspection of many elaborate tables. The complete 
report of these is contained in the writer’s thesis filed at the University of Minnesota 
Library. 








ee 





326 Harold F. Rothe 


measurement of ‘restriction of output’’ or, as it has been re-defined here, 
of the “effectiveness or ineffectiveness of the incentives’’ offered for 
working at an optimal rate. 

To make such an investigation required the assumption that the 
incentives for this particular group of employees were not very effective. 
The writings of Ford and Mathewson tend to justify that assumption 
even before any evidence is sought in this or any other situation. One 
inherent bit of evidence is the fact that these employees were paid a 
straight time wage, plus overtime, and that no incentive system was in 
operation. Most books dealing with motion and time studies describe 
many situations in which output, even among very experienced workers, 
increases greatly when an incentive system is substituted for a straight 
time wage. The low ratio between the mean rate of the fastest and the 
slowest operators also suggests that the incentives were not wholly effec- 
tive. This ratio is 1.73 im this situation. Wyatt, Frost, and Stock 
showed that this ratio increases as the incentives to work become more 
effective (12). Seashore described how the range of individual differ- 
ences increases when the different persons use different work methods 
(10). The present employees did use different methods here, and still 
the ratio was only 1.73 to 1. 

This ratio, although large in terms of economic significance, is small 
when compared to the ratio that is customarily found in laboratory ex- 
periments on motor and manual skills. Possibly the larger ratio found 
in laboratory experiments, often greater than 3 to 1, is a reflection of the 
strong incentives operative in the laboratory situation. There is an 
enormous difference in motivation between the student who performs a 
laboratory manipulation for five or ten minutes and the industrial worker 
who performs his tasks hour after hour, day after day, and year after year. 

There is another respect, too, in which these industrial tasks and 
ratios may be compared with laboratory experiments, although the data 
are not very clear-cut in either of these instances. The reliability of 
most light manual and motor tests is generally 0.90 or higher. This 
means that any given individual will tend to get very close to the same 
result on a re-test. But the workers studied here did not show a close 
correspondence on their re-tests. That is, each individual worker here 
showed a large range of output rates, so large that, in most instances and 
for the same operation, the ratio of the range of intra-individual differ- 
ences was higher than the ratio of the range of inter-individual differ- 
ences. This would most likely not be true of a typical laboratory ex- 
periment on a similar type of manipulation. This leads to the hypothesis 
that the incentives to work may be considered ineffective when the ratio of the 
range of intra-individual differences is greater than the ratio of the range of 
inter-individual differences. 




















Output Rates among Butter Wrappers: II 327 


This hypothesis of industrial motivation is presented here as a tenta- 
tive one deserving further investigation. It is derived from a very small 
number of cases and there are some exceptions to it in these very data. 
It is derived with the aid of the assumption that the incentives were not 
wholly effective in this situation. There was, however, no clear-cut 
evidence of “restriction” among these operators. More than that, the 
extremely higher productive rates of Operators 1 and 6 over the other 
operators might better be taken as evidence that there was certainly no 
“organized restriction.’”” But the assumption that the motivation was 
not very high for most operators here is justified partly by the other 
writers mentioned above, partly by the impressions recorded by the 
investigators during the study, partly by the tendency towards a common 
mean rate of production among six of the operators, and partly by the 
tendency for the group members to vary their output together over any 
one day as reported in the previous paper. Further investigation is 
needed to test the value of this hypothesis. 


Received June 25, 1945. 


References 


1. Bedford, T. The ideal work curve. J. Industr. Hyg., 1922, 4, 235-245. 
2. Bliss, E. F., Jr. Earnings of machine tenders and of bench workers. Person. J., 
1931, 10, 102-107. 
3. Evans, W. D. Individual productivity differences. Month. Labor Rev., 1940, 50, 
338-341. 
4. Farmer, M., and Brooks, R. W. Motion study in metal polishing. London: Ind. 
Hith. Res. Bd., Rep. No. 15, 1921. 
5. Ford, A. A scientific approach to labor problems. New York: McGraw-Hill, 1931. 
6. Goulden, C. N. Methods of statistical analysis. New York: John Wiley and Sons, 
1937. 
. Hull, C. L. Aptitude testing. New York: World Book Co., 1928. 
. Mathewson, 8. D. Restriction of output among unorganized workers. New York: 
The Viking Press, 1931. 
9. Rothe, H. F. Output rates among butter wrappers: I. J. appl. Psychol., 1946, 30, 
199-211. 
10. Seashore, R. H. Work methods: an often neglected factor underlying individual 
differences. Psychol. Rev., 1939, 46, 123-141. 
11. Stead, W. H., and Shartle, C. L. Occupational counseling techniques. New York: 
American Book Co., 1940 
12. Tiffin, J. Industrial psychology. New York: Prentice-Hall, 1942. 
13. Wyatt, S., Frost, L., and Stock, F. G. L. Incentives in repetitive work. London: 
Ind. Hith. Res. Bd., Rep. No. 69, 1934. 
14. Yoder, D. Personnel management and industrial relations. New York: Prentice- 
Hall, 1942. 


on 











ee eee 





Relation Between Scores on Certain Standard Tests and 
Supervisory Success in an Aircraft Factory * 


A. Q. Sartain 
Southern Methodist University 


The question of how to select supervisory personnel is frequently one 
of the most important faced by a business enterprise. Since success as a 
worker is no guarantee of success in supervision, it is natural that psy- 
chological tests should be considered as possible instruments for selection 
of suitable persons for supervisory responsibilities. 


Statement of the Problem 


The problem of this study was to determine the extent to which suc- 
cess in supervision in an aircraft factory was predicted by the following 
standard tests: Otis Self-Administering Test of Mental Ability (Higher 
Examination); Tiffin and Lawshe Adaptability Test (Form A); Revised 
Minnesota Paper Form Board; Bennett Test of Mechanical Comprehen- 
sion (Form AA); Remmers and File How Supervise? (Experimental 
Edition, Form A); Bernreuter Personality Inventory; and Kuder Prefer- 
ence Record. 


Subjects and Conditions of the Experiment 


The tests listed above were given to 40 members of supervision in the 
factory. Thirty-seven of these men were assistant foremen, and three 
were foremen. Each man was rated by the foreman and general foreman 
over him (except in the case of the foremen, where it was necessary to 
secure a second rating by the general foreman, the second rating being 
obtained about three weeks after the first). Each man was rated on two 
different rating forms, and the combination of the four ratings constituted 
the criterion of success. 


The Criterion 


In setting up the criterion, the ratings on each rating form were con- 
verted to standard deviation scores, and the sum of these scores became 


* The writer wishes to express his appreciation to the Texas Division of North Ameri- 


can Aviation, Inc., for supporting and making possible this study. Special acknowledg- 
ment is made of the help of Mr. Ross A. Peterson, Director of Education. 


328 








Relation between Scores and Supervisory Success 329 


the criterion. An attempt was made in preliminary studies to insure 
both the reliability and the validity of each rating form. One of these 
forms (called Form A henceforth) consisted of the seven qualities which 
had been found to correlate most highly with success as a supervisor, 
each quality being listed on a separate sheet. In the preliminary study, 
the correlation between the average of two ratings and the average of 
two scores or grades given for success on the job was .88 (.84 when new 
ratings were secured five weeks later), and the correlation between two 
ratings for each man was .64. The number of employees involved was 
43. Thus, it is concluded that Form A was sufficiently reliable and valid 
to comprise a part of the criterion. 

The second rating form (Form B) consisted of ten qualities, all on a 
single sheet. In the preliminary study (N = 54), the correlation be- 
tween the average of two ratings and the average of two scores or grades 
on supervisory success was .92. The two ratings correlated with each 


Table 1 
Correlations between Ratings Constituting Criterion 











Ratings y 
Average Rating on A vs. Average on B...................... .79 
I OI ME, WN EO Bs nc cesses cc cccccccccsceces 77 
Second Rating on A vs. Second on B.....................-.. 62 
First Rating on A vs. Second on B.................06 0000055 54 
Second Rating on A vs. First on B............. 05.00. e cece ee 48 





other to the extent of .63. Thus, it appears that Form B was also rea- 
sonably reliable and valid. 

It should be emphasized that the results just cited were from earlier 
studies of the rating forms. In the present study the results were hardly 
so favorable. Table 1 presents the relevant findings for this study. 
While these correlations are not as high as earlier studies might lead one 
to expect, they appear to be high enough to indicate that the combined 
ratings might well serve as the criterion. 


Results of the Study 


As Table 2 brings out, correlations between the test scores and the 
criterion were low, in every case so low as to lack statistical significance. 
(According to Fisher, for a coefficient of correlation to be significant at 
the 5% level of confidence under the conditions of this study it would 
have to be .304; at the 1% level of confidence it would have to be .393.') 


1 Guilford, J. P., Psychometric methods. New York: McGraw-Hill Book Co., Inc., 
1936, p.i549. 








A. Q. Sartain 


Table 2 
Coefficients of Correlation between Test Scores and Criterion 























Test r 

Otis Self-Administering 04 
Adaptability —.07 
Minn. Paper Form Board 10 
Bennett Mechanical Comprehension —.15 
How Supervise? —.18 
Bernreuter Personality Inventory 

BI1-N —.11 

B4-D 12 

F1-C 01 

F2-5 07 
Kuder Preference Record* 

Mechanical 004 

Social Service — .06 

Clerical .003 








* The plant was closed before this study was concluded, and the data on the other 
interest scales of the Kuder test inadvertently destroyed. 


These low correlations may be due to a faulty criterion. It seems more 
probable, however, that the tests simply fail to correlate with supervisory 
success in this plant. 

Correlations were obtained between-some of the test scores, and are 
presented in Table 3. The correlation between the two general mental 
ability tests (.86) and those between the mechanical ability tests and the 
general mental ability tests (.33 to .41), as well as that between the two 


Table 3 
Coefficients of Correlation between Certain Test Scores 











Tests r 
Adaptability vs. Otis 86 
vs. How Supervise? —.44 
ns vs. Form Board 33 

S vs. Bennett Al 
Otis vs. Form Board 39 
“ vs. Bennett 37 
Bennett vs. Form Board 31 
How Supervise? vs. Kuder Persuasive .00 
ad * vs. Kuder Social Service ef 
Kader Mechanical vs. Form Board 13 
- vs. Bennett 15 
Kuder Scientific vs. From Board 19 


_ vs. Bennett 15 























Relation between Scores and Supervisory Success 331 


mechanical ability tests (.31), are not far different from those found in 
most similar studies.2 The correlation between Adaptability and How 
Supervise? indicates that general mental ability goes with favorable 
supervisory attitudes (low scores on this test indicating a favorable 
attitude) to a moderate degree. Other coefficients are too small to have 
significance. 


Additional Studies 


Two other studies were made of the success of the Otis and Bernreuter 
in selecting supervisors. In one of these, the sum of the scores on both 
the rating scales was again used as the criterion of success, two ratings 


Table 4 
Relation of Bernreuter and Otis Scores to Rated Success in Supervision 




















Test r 

Otis 16 

Bernreuter 
B1-N —.12 
B4-D 04 
F1-C —.09 
F2-S — .02 

Table 5 
Comparison of Bernreuter and Otis Scores of Groups of Good and Poor Supervisors 
Poor Group Good Group 
Test or Critical 





Scale No. Mean S.D. S.Dxm No. Mean 8.D. 8.Dm Ratio 





B1-N 29 —127.1 61.20 11.56 24 —1464 46.30 9.66 1.29 
B4-D 29 85.6 48.50 9.16 24 108.3 65.10 13.58 1.39 
F1-C 29 — 95.0 73.05 13.77 24 —109.3 51.90 10.80 82 
F2-S 29 — 36.5 44.20 8.35 24 — 38.8 48.90 10.18 17 
Otis 28 101.1 13.03 2.51 24 105.1 10.08 2.10 1.22 





on each form being secured on 85 men. Table 4 is based on this study. 
It is clear that the coefficients are most likely due to chance. 

In the second study, 53 members of supervision who were known well 
to three individuals in management positions were divided into two 
groups, good supervisors (N = 29) and poor supervisors (N = 24). 
The members of each group were selected because there was agreement 
among those classifying them that they belonged in one or the other group. 


* Greene, E. B., Measurements of human behavior. New York: Odyssey Press, 1940, 
p. 257; p. 361. 





332 A. Q. Sartain 


When the Bernreuter and Otis scores of these two groups were compared, 
the results shown in Table 5 were obtained. It will be noted that the 
differences all favor the good supervisors, that is, that they appear to be 
more intelligent, more stable, more dominant, more self-confident, and 


more sociable, but that no difference even approaches statistical signifi- 
cance. 


ee 


Summary and Conclusions 


The following tests were administered to forty members of supervision 7 
in an aircraft factory: Otis Self-Administering Test of Mental Ability 
(Higher Examination); Tiffin and Lawshe Adaptability Test (Form A); 
Revised Minnesota Paper Form Board; Bennett Test of Mechanical 
Comprehension (Form AA); Remmers and File How Supervise? Test 
(Experimental Edition, Form A); Bernreuter Personality Inventory; and 
Kuder Preference Record. Two ratings on each of two rating forms were 
then secured for each man, the rating forms previously having been 
checked for reliability and validity, and the sum of the four ratings 
(reduced to standard deviation scores) became the criterion of success. 
Test scores were then correlated against the criterion. In every instance 
the coefficients obtained were too low to be considered significant, the 
highest one being only .18. It was concluded, therefore, that these tests 
had little or no predictive value for success in supervision in this plant. 


Two additional minor studies corroborating this conclusion in part 
are also reported. 








4 
5 
; 
: 
a 
< 


Ae si 


Received August 31, 1945. 





na han ee 


ax 














Test Validation on Remote Criteria 


Doncaster G. Humm 
Personnel Service, Los Angeles, California 


The demonstration of the usefulness of psychological tests in an ap- 
plied situation, such as selection for employment, is a problem of test 
validation on remote criteria. Test validation on immediate criteria, of 
course, involves the demonstration of the effectiveness of a test’s ability 
to measure that which it is meant to measure. Thus, a skill test is 
validated on an immediate criterion if it is shown to be capable of meas- 
uring skill; an intelligence test, to measure intelligence; an interest inven- 
tory, to measure interest; and so forth. 

However, there are some situations in which some demonstration of a 
test’s usefulness to the situation itself needs to be shown. Before an 
intelligence test properly may be considered a part of the battery for the 
selection of employees, there must be some demonstration that intelli- 
gence is a factor in suecess on the job and that the test being considered 
measures intelligence in such a situation. The result is that test valida- 
tion on remote criteria sometimes becomes necessary. 

A rigorous solution of such a problem requires careful consideration of 
basic assumptions plus the provision for adequate safeguards against the 
intrusion of extraneous factors. It is, in fact, a complicated scientific 
experiment which must be set up and carried through with due regard to 
the requirements of scientific method. 

Two major attacks on such an experiment are possible: (1) the em- 
ployment of logical analysis, and (2) the employment of quantitative 
methods. Either of these attacks will be successful if it is essentially 
factual, carefully controlled, and adequately thorough. 

The procedure employing logical analysis may well start with a 
factual consideration of all of the characteristics needed for success. 
Thus, if the problem is the selection of combat flyers, it will be necessary 
to isolate the causes of failure and also the factors leading to success. 
In many cases, the latter factors present opposite phases of the former; 
but this cannot be taken for granted. Some of the factors leading to 
success are separate and distinct from those leading to failure. The 
demonstration of both the qualities required for success and those pre- 
disposing to failure must, of course, be proved beyond reasonable doubt. 
Thus, if intelligence is to be a factor in such a situation, there must be 

333 








334 Doncaster G. Humm 


some demonstration that certain ranges of intelligence are accompanied 
by success‘and certain are accompanied by failure. It is, however, suffi- 
cient to demonstrate by factual job analysis that the characteristics are 
important in selection. 

The quantitative consideration of such a problem may be attacked in 
either of two ways: (1) all of the critical factors contributing to or hinder- 
ing success in the situation except the factor to be measured by the test 
under examination may be equated and held constant; or, (2) all of the 
factors may be measured, their interrelationships with success and with 
each other determined, and the effect of all factors except that under 
consideration partialed out. 

As can be readily seen, it is almost impossible to separate the two 
general types of attack first mentioned, since logical analysis and quanti- 
tative considerations are both considered in the second and since, also, 
quantitative methods may often be successfully used in the first. 

The point to be emphasized, however, is that the task is not simple. 
The direct correlation between an intelligence test and success on the job, 
for example, is not justified; since it is possible that other factors may 
vary to the extent that zero or negative correlation between intelligence 
and success on the job may be obtained when actually intelligence may 
be a decisive factor. This is quite likely to happen in jobs which require 
low-average intelligence, since it has been demonstrated that intelligence 
too high for the job is equally as handicapping as intelligence too low for 
the job. As a consequence, the relationship between success on the job 
in such job brackets and scores in intelligence tests is likely to be curvili- 
near, with greater tendency to report high intelligence in negative fashion 
than low intelligence. Incidentally, the same result is likely to happen 
with regard to skill tests and aptitude tests. 

Any attempt to oversimplify the attack on the validation of tests on 
remote criteria is very likely to bring discredit on the test unjustly. 

Let us consider an example of the correct way to examine the selective 
value of a test, taking, for instance, the Minnesota Spatial Relations Test 
for assembly workers. If we assume that intelligence, interest, tempera- 
ment, physical fitness, previous experience, and quality of supervision are 
all factors of success or failure on this job, we may proceed in one of two 
ways. In the first, one should provide that these other factors are all so 
well equated as to permit us to examine the effect of the Minnesota test 
uninfluenced by their variations. As an alternative, we may set up 
measures for all of these factors, determine the relationship between these 
various factors and success on the job and each other, and partial out 
their effect on the Minnesota test. 

Note that the first thing we had to do was to make an assumption as 





















































Test Validation on Remote Criteria 335 


to the factors important to success or failure on the job. There is, how- 
ever, no justification for making those assumptions, unless they have 
been proved by some previous examination. As a consequence, since 
the second of the two alternatives provided justification for such as- 
sumptions, it is probably more likely to return valid results than the first. 

One needs only to consider the end results of such a careful considera- 
tion of many factors and contrast them with the oviginal correlation be- 
tween success on the job and the test under consideration to see how 
dangerous it is to attempt to use the latter measure alone and without 
examining these other factors. 

There is one major drawback to the employment of this second type 
of attack on the problem. It is that it involves the use of multiple re- 
gression equations of high order which must be solved by determinants 
so complicated that solution borders on the impossible. 

Let us use as an example the problem of studying the effectiveness of 
the Kuder Preference Record in the selection of salesmen. Again, let us 
assume that intelligence, skill, temperament, physical fitness, previous 
experience, and quality of supervision are additional factors. There are 
nine factors to be considered in the components reported by the Kuder, 
seven additional factors if we use the Humm-Wadsworth Temperament 
Scale for temperament, and at least one factor for intelligence, skill, 
physical fitness, previous experience, and quality of supervision. In 
all, this makes 21 factors to be considered with relation to the criterion 
of success on the job; a task requiring a determinant of the 21st order, or, 
in other words, a set of 21 simultaneous equations containing 21 unknowns. 
While the solution is possible, it is extremely difficult, expecially if the 
error in rounding off is adequately considered.! 

For ready solution, therefore, the problem must be considerably sim- 
plified. It is possible to accomplish this by considering the Kuder as a 
unit on a scale of relative pertinence and the measure of temperament on 
relative acceptability of the behavior tendencies reported. Some of the 
other factors then may be eliminated by equating them. For example, 
only those salesmen may be considered who are physically fit, satisfactory 
in experience, and well adjusted to their supervisor. In this way, the 
number of unknowns may be reduced to four and the solution of the 
determinant simplified. 

The point is, however, that these factors have all received considera- 
tion. Failure to do this will result in inaccurate findings. 


1The error in rounding off may lead to very inaccurate results if not adequately 
considered. Thus, if 50 + .5 is multiplied by 50 + .5, the product equals 2,500.25 
+ 50. Similarly, if 1,440 + .5 be divided by .012 + .0005, the quotient is approxi- 
mately 120,000 + 5,426 or somewhere between 125,426 and 114,574. 











336 Doncaster G. Humm 


Some factors of success and failure are difficult to deal with quantita- 
tively. One of these is the quality of supervision. This is an extremely 
important and yet a frequently neglected consideration. It is not easily 
dealt with for the reason that the matter of incompatability often enters 
the picture. Thus, a foreman may actually be an excellent foreman and 
the worker an excellent workman and both may get along well with others 
on the job, but their two temperaments may clash to such an extent that 
they seem unable to work together without striking sparks. 

Another factor that often enters the picture is that of compensatory 
mental reactions or the influence exerted on one characteristic by other 
characteristics. This may be manifested either positively or negatively. 
Thus, an individual worker may have fairly low skill and yet perform very 
adequately by reason of the fact that he is painstakingly careful or is 
highly intelligent or is an assiduous worker. Contrariwise, an individual 
may have very superior skill; but, because he is too intelligent for the 
job, may be so bored as not to put forth the effort to manifest that skill. 

Compensations in the field of interest are especially frequent. An 
examination of the interest patterns of any group of workers may indi- 
cate, if interest alone be considered, that a number of these workers are 
misplaced. This may or may not be true. The fact is that many of 
these individuals may be securing sufficient compensation for their lack 
of interest in the job by some avocational pursuit which gives adequate 
expression for that interest, and other incentives for staying on the job 
may be strong enough to make them wish to remain. Thus, if interest 
examinations are measured against success on the job, the real part that 
interest plays in that success is difficult to measure. 

One of the drawbacks to the validation of tests on remote criteria is 
the fact that some factors in success and failure cannot be readily taken 
into account. Thus, it is very difficult to consider the effect of home 
conditions, outside social adjustments, worry about finances, and the 
like. At some times these are very real factors in success or failure of the 
individual. 

On the whole, then, we seé that the problem of considering the effec- 
tiveness of a test in its applied situation is extremely complicated and 
difficult. Unfortunately, the psychological literature reports many ex- 
amples of failure to consider these complications. In many instances, 
the requirements of scientific experiment are seriously overlooked. If 
these examples were the exemplification of the work of tyros, a paper such 
as this perhaps would not be necessary; but the fact is that some psy- 
chologists of high rank have used remote criteria without adequate pre- 
caution. These examples will illustrate: 














Test Validation on Remote Criteria 337 


1. A report made on the Guilford-Martin Inventory (on the basis of 
48 cases) assumed that behavior was an exclusive resultant of 
temperament (behavior tendency) and neglected the trigger effects 
of intelligence, skill, and supervision. 

2. A study made of the Humm-Wadsworth Temperament Scale (59 
cases) reported it as ineffective because, when used alone, it did 
not differentiate between good and poor salesmen. No controls 
whatever were set up for experience, intelligence, skill, etc. 

3. A critique of the Bernreuter Inventory (on the basis of 95 normal 
and 329 abnormal soliders) reported that “raw scores were not 
significantly differentiating.”’ No interrelationships were used, 
and no other factors, aside from the inventory, were considered. 

4. On the basis of two cases and eight months follow-up of one case, 
it was concluded that the “Rorschach seems to have high validity 
in this type of employee selection.”” No other evaluation tech- 
niques were mentioned. 

5. Seven sub-tests selected from the Bennett, McQuarrie, and 
O’Rourke were found to have a multiple correlation of + .47 with 
instructor’s ratings of success in mechanical training course of 147 
high-school students. No equating of intelligence or consideration 
of other factors was reported. 


We have already pointed out that the problem of test validation on 
remote criteria will require some logical analysis as well as quantitative 
treatment. This follows from the fact that applied mathematics requires 
both a sound knowledge of mathematics and a thorough familiarity with 
the subject matter. Indeed, a mathematical treatment is no more than 
a logical manipulation which proceeds from certain assumptions to con- 
chisions. It follows that mathematical treatment must be founded on 
that good sense which is common to the subject matter. This again is a 
requirement which often has failed to be met in psychological literature. 
There are too many examples of psychologists conducting research on the 
validity of tests without beforehand making themselves thoroughly 
acquainted with the work which has already been done on the test. 
Some recent publications on the Bell Inveniory, the Kuder Preference 
Record, the Bernreuter, and other tests will illustrate this point. 

It is also to be regretted that we have so many reports that are based 
on an insufficient number of cases to establish statistical momentum. 
In fact, all, except possibly one, of the five examples previously quoted 
illustrate this error. 

A suggested remedy for this situation is a more extended use of logical 
analysis. The writer would be one of the last to decry the use of quanti- 
tative method, and yet one of the first to criticize its unwarranted or 















338 Doncaster G. Humm 





superficial use. Mathematics is probably the queen of the sciences; ? 
yet, mathematics does not constitute the only way to think. 

In a great many situations, actually, it is a waste of effort to validate a 
psychological test on such a remote criterion as success on the job, since 
there are certain qualities which may safely be assumed as being prere- 
quisite for the job. Among such qualities that are at least to a consider- 
able extent desirable are commensurate skill, pertinent interests, good 
mental health, freedom from anti-social tendencies, and adequate physi- 
cal health. 

A critical analysis of many of the reports on the usefulness of tests 
will reveal that the validity study might better have been made on an 
immediate rather than a remote criterion. For example, if we can assume 
that fine dexterity is necessary for the assembly of watches and we have 
adequate proof that the Purdue Peg Board is a good measure of fine 
dexterity, then it must follow that the Purdue Peg Board is likely to 
prove a valuable member of the test battery for watch assemblers. 
Similarly, if we can assume that a cashier must be trustworthy and honest 
and we have a test which validly measures honesty and trustworthiness, 
then it follows that such a test is very likely to help in the selection of 
cashiers. 

Such a direct attack on the problem is often likely to be more success- 
ful than an attack through the use of remote criteria. 

In many instances in the industrial situation, the case-study method 
may be used to an advantage. This is especially true in the study of 
failures. The writer assisted in such a study. It included 330 problem 
employees in a public service company. There the case-study method 
revealed that approximately 6 per cent of the failures were explained on 
the basis of unfitting intelligence, 6 per cent on the basis of skill, 6 per 
cent on the basis of physical fitness, 80 per cent on the basis of tempera- 
ment, and 2 per cent for miscellaneous reasons. Similar case studies of 
problem employees are very likely to reveal characteristics that are 
important, at least for their elimination in selection procedures. 

If such studies are followed by case studies of outstanding employees, 
the observations already made may be verified and information of ad- 
ditional importance revealed. 


Summary 


The component tests of a test battery cannot be directly validated on 
success in the situation or the job unless the requirements of scientific 
experiment are rigorously met. This implies either that all other factors 
except the one under consideration are equated and kept constant or that 


* As Eric Temple Bell has said. 















Test Validation on Remote Criteria 339 
adequate mathematical safeguards (including provision for taking care 
of the error of rounding off) are set up to measure all important variations. 
Such a task requires both logical analysis and mathematical treatment. 
Neither may be neglected. 

Where the situation or the job is adequately analyzed and where the 
characteristics needed for the job are clearly established, it is better to 
validate tests on immediate criteria. That is to say, it is better to as- 
certain whether or not the tests are effective in measuring that which they 
are meant to measure. While this requires no less rigor in mathematical 
treatment, it is such a simpler task that the results are more likely to be 
found in accordance with the facts. 


Received October 2, 1945. 


























PAI A a 


The Development and Standardization of a New Type 
Test of Peripheral Vision * 


John Allan McClure 
Division of Education and Applied Psychology, Purdue University 


Recent studies in industry have proven the value of proper visual 
skills in relation to job performance, quality of workmanship and accident 
experience. Results indicate that one of the visual skills, peripheral 
vision, not previously included in the research, might prove to be an 
important factor in jobs requiring this skill, such as crane operator, truck 
driver, and industrial tractor operator. In order that research may be 
conducted to determine the importance of peripheral vision in industry 
the new type perimeter described in this report was developed. 

The present study is concerned with the determination of the reli- 
ability of the instrument, the relationship of peripheral vision to other 
visual skills, and the accumulation of standard norms. 


Development of the Perimeter 


Knowledge of the field of vision and a realization of its limits have 
been indicated in the literature from the time of the Greeks and Romans. 
Thomas Young (10) in 1801 made the first accurate study of the field of 
vision. He listed the outer limits of the normal field at 90° on each side. 
Purkinje (6) reported in 1825 an outer limit of 100° and 110° with the 
pupils dilated. Von Graefe (9), in 1855, was the first to report a study of 
the visual fields for diagnostic purposes. He used a simple campimeter 
consisting of a small blackboard with a piece of chalk on the end of a wire 
as his test’ object. The first campimeter developed by Aubert and 
Foerster (2) consisted of a flat surface with letters or figures arranged 
around a fixation point. The experiment was conducted in a dark room 
with the flash from an electrical discharge illuminating the surface. The 
next instrument developed by them consisted of a flat strip fastened to an 
upright in such a way that it could be rotated. From this later instru- 
ment Foerster (2) developed, in 1869, the curved arc instrument that is 
basically the perimeter so widely used today. 

The measurement of peripheral vision on the arc perimeter is influ- 

* This article is based on the author’s thesis of the same title submitted to the 


faculty of Purdue University in partial fulfillment of the requirements for the degree of 
Doctor of Philosophy, February, 1946. The thesis was directed by Dr. Joseph Tiffin. 


340 











New Type Test of Peripheral Vision 341 


enced by the size of the test object, its brightness, its color, its distance 
from the eyes, its background, the exposure time, and the amount of 
illumination. The subject introduces variability in his attention and the 
light adaptation of his eyes. Most of the attempts toward improvement 
of the perimeter have been to control and standardize these conditions. 

Pascal (5) recently described an are perimeter on which he uses a 
changeable fixation target and an illuminated test object. He uses a 
manually operated switch to flash the lights on and off. Mayer (4) de- 
scribed a test object, mounted on the are of an ordinary perimeter, that 
is illuminated by a neon tube with a flash speed of .025 second. Burn- 
ham (1) developed a perimeter that he called a perihemisphere because 
it uses a hemispherical aluminum shell in which the test object can be 
located anywhere in the visual field. Color filters can be placed in the 
test object which is illuminated by a flashlight. The interior of the 
shell is painted white and brightly illuminated to present a uniform field. 

In spite of the many attempts to improve the design for ease of opera- 
tion and increased objectivity, two leading clinical perimetrists, Traquair 
(8) and Thomasson (7), prefer to use a simple adaptation of the original 
Foerster perimeter. Traquair (8) points out that perimetry is a highly 
subjective form of examination, an examination of the subject’s sensa- 
tions as described by himself in answer to questions put to him by the 
observer, and any success in the results is dependent more on the skill 
and knowledge of the experimenter than on the instrument used. 


Previous Related Studies 


Low (3) has reported a study dealing with peripheral vision and cer- 
tain relationships between peripheral vision and other visual functions. 
He points out that a number of war-time accidents indicating faulty 
peripheral vision as a causative factor, prompted him to develop a rea- 
sonably short, accurate test of this visual function and to investigate the 
possibility of improving it by training. He used an ordinary are perim- 
eter with test objects of varying diameters. 


The Apparatus and Test Procedure 


Figure 1 shows the perimeter in use. It consists of a base, two 
swinging arms mounting lamps that contain the test objects, a stationary 
arm in the center mounting the fixation target and its lamp, protractors 
for measuring the angle of each swinging arm, enclosures, head rest, and 
the necessary wiring. The light control, an independent unit, consists 
of a double throw switch that turns the stimulus lamps on steady or into 
the flash timing circuit, a selector switch that turns the side test object 





342 John Allan McClure 


lamps on in various combinations with the center lamp and an electronic- 
ally operated flash timing switch. 

The experimental model operates only in the horizontal or temporal 
plane. It was made this way to simplify construction and yet allow for 
an evaluation of the basic concepts incorporated in the instrument. 

A schematic drawing of the top view of the perimeter is shown in 
Figure 2. The examiner’s control levers and protractors located under 
the center fixation target are not included in the drawing. All three 





Fic. 1. The perimeter in use. 


lamps are lighted with seven watt 110 volt candelabra bulbs. In the 
side or test object lamps the light is filtered and diffused through a dark 
filter and opal glass. On each side of the opal glass is a black opaque 
paper diaphragm with a centrally located ;4; inch diameter hole. These 
apertures on the opal glass serve as the test object when the light is 
turned on behind them. The opal glass is 17% inches from the eye. 
The test object size, given as a visual angle, is 37 minutes. This light 
source of low intensity is directed toward the eyes through one inch 
diameter tubes that are lined with lampblack to reduce reflection. 

The center lamp has a one-half inch diameter aperture. A disc 

















New Type Test of Peripheral Vision 








*sjred snore oy} JO sdrysuonviel oy} Butmoys JojourLIed oy} JO BULMBIP OEVWUIENIG *Z ‘DIY 





UILIWINSd DINLO3T3 4O WVYDVIO MZIA dOl 
63A3 40 WOLLIBO€ 





aqwos e/¢ 





3dismi Cae aNov Ie 
36N4 OI 074 


~ 
— 
= = 








wOldVELONE 





ail 


‘ 2016R! CINBNOVIE 
\ 3UNSOIINS 
4016 40 3nI1 Ps 
/’ GA39UVL 30S 30 TWWwi3e 
fr 
4 
4 


- 
oe waits Seve 





ove 
SrovensviC 
waeve OVE 











rr Gawov7e aia 
OWE 41H 2 

















344 John Allan McClure 


located transversely in the tube in front of the center lamp mounts eight 
targets. These fixation targets are opaque one-quarter inch numerals 
on tracing paper held between ground glass covers. The disc is notched 
on its periphery so that the targets index accurately when the disc is 
rotated by hand. A small beam from a light under the center target 
illuminates the front of the center target aperture so that the subject can 
determine where to direct his attention between flashes of the light 
’ stimulus. . 

The headrest is part of the center metal enclosure. The enclosure is 
shaped so that when the subject’s face is pressed slightly into the head- 
rest, outside light is excluded, and the subject’s eyes are positioned cen- 
trally in relation to each side lamp. The swinging arms are moved by 
levers on the protractors. The levers and protractors are located at the 
examiner’s position in front of the instrument. 

The circuit is wired so that the center lamp always lights. The test 
object lamps can be turned on with the center lamp in combinations of 
right-center, left-center, both-center, or neither-center. The electronic 
flash timer consists of a transformer, resistors, capacitors, an electronic 
tube and a relay. The wiring diagrams of these circuits can be found in 
a thesis ' located in the Purdue University Library. 

The timer was set for a flash duration of one-tenth of a second. In 
preliminary trials it was found that this time gave sufficient exposure 
without allowing the subject time to shift his eyes. 

The principal features built into the perimeter are: 1. Both eyes are 
tested simultaneously but the right eye cannot see the left field nor the 
left eye see the right field; 2. The subject must focus his attention on the 
center fixation target to read the number flashed. 3. The experimenter 
can determine whether the subject actually sees the test objects or is 
guessing; 4. The test can be given in less than ten minutes in its present 
form; 5. The test is not uncomfortable or fatiguing to the subject; 
6. The intensity and duration of light stimulus can be carefully controlled; 
7. The procedure and purpose is readily understood by the subject; 
8. The field of vision is enclosed to reduce the effects of surrounding il- 
lumination; and 9. The instrument is readily portable. 

The protractors are set to measure the side angle from straight ahead 
of the subject’s eyes. Actually, when the eyes converge on the center 
fixation target the field angle for each eye is increased four degrees and 
twenty minutes. When comparing the results of a perimeter that 
measures one eye at a time with the results of this instrument this fact 
must be considered. 


1 McClure, John Allan, The development and standardization of a new type test for 
peripheral vision, Ph.D. Thesis, Purdue University Library, February, 1946. 








New Type Test of Peripheral Vision 345 


The instrument was originally made with the center fixation target 
similar to the side test objects. To insure center fixation of the eyes and 
prevent falsification of responses, the test objects and center target first 
were made to flash successively, with the subject required to repeat the 
order in which the three lamps flashed. A motor driven brush contacting 
three adjustable contacts gave the sequence. Three tandem selector 
switches wére wired into the contact circuit to give all six combinations 
of sequence for the three lamps. This method of presenting the stimulus 
was discarded because, as the test objects approached the peripheral 
vision threshold, the subject could not remember the sequence of flashes 
although he could clearly distinguish that a sequence had occurred. 
The test given in this manner seeemed to be more a test of a special type 
of memory for perceptual experiences than a test of the field of vision. 
This method of administration was therefore abandoned. 

Enclosure of the field of vision was found to be necessary because 
various external light sources affected the results when the instrument 
was not enclosed. 

Test Procedure. The subject is seated on a stool that is adjusted to 
the correct height with the subject’s eyes level with the headrest of the 
instrument. The subject is asked to remove his glasses if he wears them. 
The subject is told that-the instrument is a perimeter for measuring how 
far to each side he can distinguish a dim flashing light while looking 
straight ahead. He is asked to press his face into the headrest so that 
his eyes are comfortably centered and so that no light enters around his 
face. The swinging arms are positioned at 45° from straight ahead. 
All lamps are turned on, after which the subject is asked what number 
he reads in the center target. He is then asked if he sees a small dim 
light on each side. Each side lamp is moved slightly while the subject is 
asked which one is moving. The examiner does not proceed until he is 
certain that the subject recognizes positively these side test objects. 
When testing those individuals with extremely narrow fields the arms 
are moved in closer than 45°. The purpose of the lighted aperture is 
explained briefly and demonstrated with a flash of light. The lamps 
are turned into the flash circuit. With each lamp set on 45° the combina- 
tions are explained while flashing center-right, center-left, center-both, 
and center-neither lamps. The subject is asked if he followed the com- 
binations correctly. If necessary, the examiner again demonstrates and 
explains until this part is thoroughly understood. The subject is told 
to respond by telling what number he reads in the center target and which 
of the side lamps, if any, flash. 

The selector switch is set to flash both side lamps. The examiner 
says, “Ready,” just before he flashes the lights. If the response is cor- 





346 John Allan McClure 


rect the lamps are moved to 65° and again flashed. This large initial 
increase in the angle works well with the average subject in speeding up 
the testing procedure. During the practice trials both side lamps are 
flashed except when there is indecision on the part of the subject. In 
such cases the increments are smaller and more variations in the lamp 
combinations are given. If the response is correct on the 65° setting, 
the side lamps are moved to 75°, and then to 85°. From there on the 
increment is by five degree intervals. Both arms are always set at the 
same angle from straight ahead. When a setting is reached where the 
subject starts to give incorrect responses for either eye, or reports that 
he fails to see the test objects, three or four extra trials are given to be 
sure that the subject’s threshold, on one or both of his eyes, has been 
passed. The arms are then brought forward five degrees to a smaller 
angle and four or five check trials are given. When the subject responds 
correctly on these practice trials the test trials are begun. 

Table 1 shows a typical record sheet. The series of ten trials shown 
under stimulus is given and the response of the subject recorded under 











Table 1 
A Typical Individual Record Sheet 
Record Sheet 
Name: Mary Smith Class: Psych. 1-B Date: 1/21/46 
Angle Stimulus 


1 2 3 4 5 6 7 8 9 10 
Left Right Both Both Both Right Left Both Both Left Right Both 





First Test 
85 85 B B B R L B B L R B 
90 90 B B B R L B B R R B 
95 95 R R R R R R R O R R 
100 100 oO oO O oO 8) oO O O O oO 
Score—90L 95R 
Retest ‘ 
85 85 B B B R B B B L R B 
90 90 B B B R B B B L R B 
95 95 R R R R oO R R O R R 
100 100 0 O O oO O O O O 0 ,.9O 
Score—90L 95R 





each stimulus trial. Of the ten trials given, eight involve an exposure 
of the stimulus on the right, and eight an exposure of the stimulus on the 
left. Thus, of the stimuli indicated across the top of Table 1, stimuli 
1, 2, 3, 4, 6, 7, 9, and 10 are used in scoring the right eye (stimuli 5 and 8 
having only left side exposures), whereas stimuli 1, 2, 3, 5, 6, 7, 8, and 10 








New Type Test of Peripheral Vision 347 


are used in scoring the left eye (stimuli 4 and 9 having only right side 
exposures). Although the responses to the fixation target numbers are 
not recorded, consistent errors in calling the numbers are noted and the 
subject is encouraged to watch the target more carefully. If the subject 
can give correctly seven out of the eight responses for each eye the angle 
is increased by five degrees and the same series of trials is given. If he 
cannot give seven out of eight responses correctly for each eye the angle 
is diminished until a point is reached where seven out of the eight re- 
sponses are given correctly for each eye. The angle is then increased in 
steps of five degrees, and the same series of ten trials is repeated until a 
point is reached where the subject states he cannot see the lights on 
either side or is consistently making errors so that the examiner is con- 
vinced the subject is guessing. All of the responses are recorded in the 
test trials. 

In the typical series shown in Table 1, the score on the first test is 
90° for the left eye and 95° for the right eye. The reason for these scores 
is that when an angle of 90° was used the only mistake made was on trial 
8, which involved only the left eye, indicating that all eight trials involv- 
ing the right eye were correctly reported. When the angle was increased 
to 95°, of the trials involving the left eye, trials 3, 5, 6, 7, 8, and 10 were 
reported as if there were no light on the left side, although actually a 
sight appeared in the left side in all of these trials. Since fewer than 
leven of the eight trials involving the left eye were correctly reported at 
95°, the score for the left eye is recorded at 90°. 

For the right eye, however, it will be noted that of the eight trials 
involving the right eye at 95°, every one resulted in a response indicating 
that the light on the right was seen whenever it was presented. Since 


at the next step, 100°, all responses were wrong, the score for the right 
eye was recorded at 95°. 


Experimental Procedure 


Subjects. Two hundred and two subjects enrolled in psychology 
courses were used as subjects. There were 96 males and 106 females. 
Their ages ranged from 16 to 40 with an average of 21.1 and a standard 
deviation of 8.37. Two subjects were scheduled for each half hour 
period. 

Other Visual Skill Tests. The Bausch and Lomb Ortho-Rater was 
used to measure the visual skills of far vertical phoria, far lateral phoria, 
far acuity of both eyes, far acuity of right eye, far acuity of left eye, color 
vision, depth perception, near acuity of both eyes, near acuity of right 
eye, near acuity of left eye, near vertical phoria, and near lateral phoria. 

Each subject was given the perimeter test first and then the Ortho- 








ma a AA ee 





~ . 


oe ny bier 


: 
: 
t 
; 
t 
- 
a. 


348 John Allan McClure 





Rater test followed by a retest on the perimeter. There was an approxi- 
mate lapse of fifteen minutes between the first test and the retest with the 
perimeter. About one-third of the subjects were tested on days when 
the weather was cold and clear with bright sunshine on freshly fallen snow. 
Because of the close scheduling of subjects, only about five minutes could 
be allowed for light adaptation before the first test. The room lights 
were on during the testing. 


Results 


Method of Scoring. The record sheet of each subject was first scored 
to determine how many trials resulted in correct response for each eye at 
every angular setting marked. The largest angular setting was noted 
for each eye where seven out of eight responses involving that eye were 
correct. The largest angular setting was also noted for each eye where 
all of the responses involving that eye were correct. The sum of the 
right and left eye readings determined the included angle for each set of 
responses. 

Reliability. The only published report found on the reliability of 
perimeter tests is that of Low (3), who found the reliability of his instru- 


Table 2 


Average Angular Thresholds and Standard Deviations of Left, Right, Both, and Total 
Fields for 7 out of 8 Correct Responses and 8 out of 8 Correct Responses 
with Correlations between First Test and Retests of Each 

















Criterion—7 of 8 8.E. S.E. 
N—202 Mean S8.D. Mean Diff. Diff. C.R. r 8.E., 
Left Eye First Test 92.2 7.82 .550 
Retest 94.7 6.70 471 25 3831. 7.55 80 .025 
Right Eye First Test 93.3 7.26 511 
Retest 96.0 7.07 * gh) | Sie ae? Se eee 
Included First Test 183.1 19.83 1.395 
Angle Retest 188.0 12.45 876 49 827 5.93 83 .022 
Criterion—8 of 8 8.E. S.E 
N—169 Mean S8.D. Mean Diff. Diff. C.R. ¢ "Oh. 
Left Eye First Test 90.9 7.91 .608 
Retest 93.9 7.25 558 3.0 .390 7.69 . .70 \ .030 
Right Eye First Test 91.7 7.73 595 
Retest 94.7 6.55 504 3.0 434 691 .70 .039 
Included First Test 180.2 1449 1.115 
Angle Retest 186.1 12.85 988 59 .692 853 .79 .029 








New Type Test of Peripheral Vision 349 


ment, an arc perimeter, to be .91. This test required from 40 to 60 
minutes to administer and was of a clinical nature. 

In the present investigation, correlations between the first test and the 
retest were obtained for the left eye, for the right eye, and for the in- 
cluded angle. The means and standard deviations of the scores as well 
as the correlations between the first test and the retest scores are given 
in Table 2. 

The correlations are slightly higher when seven out of eight correct 
responses were used as the criterion than when eight out of eight were 
used. Because of its wider range, the measure of included angle gives 
the largest coefficient of reliability. The right eye test-retest correlation 


iF gts 
Lil ° 


S 
, 


Ww 
> 
o 
> 
” 
° 
~ 
z 
Ww 
cS) 
a 
i) 
a 





s a + a ‘ + 





150 160 170 180 190 210 
INCLUDED ANGLE OF VISUAL FIELD IN DEGREES 


Fic. 3. Frequency distribution of the total angular field. 


gives the lowest coefficient of .75, the left eye a coefficient of .803 and 
the included angle a coefficient of .83. 

The average scores on the retests were slightly, but significantly, 
higher than on the first tests. Evidently increased familiarity with the 
instrument and light adaptation account for the small increases in average 
scores. It was evident in the testing procedure that the subject usually 
erred only once in a series of ten trials if the test objects were located 
within his field of vision. A few check trials showed that when an error 
was made in this instance a second and immediate repetition of the stimu- 
lus combination usually resulted in a correct response. 

Individual Differences. Figure 3 is a frequency distribution of the 
included angle of the visual field. This curve was plotted from the retest 
results using seven out of eight correct responses as the criterion in deter- 























350 John Allan McClure 
mining the score. The range is from 145° to 210°, or 65°. The mean is 
188° with a standard deviation of 12.45°. 

Figure 4 shows frequency distributions of the right eye field and the 
left eye field, again using retest results with seven out of eight responses 
correct as the criterion in determining the score. The mean angle for the 
right field, 96.0°, is slightly larger than the mean angle for the left field, 
94.7°. This difference has a critical ratio of 4.2, indicating that the field 
of vision on the right is significantly, although only slightly, wider on the 
average than the field of vision on the left. The correlation between the 
size of the right and left fields found from the retest results, and used in 





set A 
4 a, 
/ 
is \ 
1s 
30h i ‘ \ 
fp an 
rhe \ \ 
ee \ 
JZ. LEFT EYE ——-—~-——-- ee : \ 
RIGHT EYE-——-—--— x ee | 
° : rs 80 } / i ; 
enol. N=202 / / \ \ 
~ ae f j/  % 
fi: Ma 
Ya / / \ \ 
j ow , A 
Sisk } / \ \ 
5 i f : 
- | if \\ 
z j ‘ 
WwW : ? 
° WA \ 
«ior Jt % | 
w if it 
a sf Y 
z Y, 
B ee ait a 4 + 4 4 s 1 4 > 
- 70 75 60 65 90 95 100 105 110 “Tis 





LEFT AND RIGHT VISUAL FIELDS IN DEGREES 
Fic. 4. Frequency distribution of the angular field of each eye. 


determining the significance of the difference between the right and left 
fields, was .80, with an S.E. of .026. 

Figure 5 shows the frequency distribution of the male and female 
included angle visual fields. The males had a mean included angle of 
188.8°, 8S.D. = 8.85. The females had a mean included angle of 182.5°, 
8.D. = 14.1. The obtained difference of 6.3° between the mean included 
angle of the males and the females was 3.7 times as large as the S.E. of 
the difference, thus showing that the males have a slightly, but signifi- 
cantly, wider visual field than the females. 

Relation to Other Visual Skill Tests. Table 3 lists the correlations 





New Type Test of Peripheral Vision 351 








30r 
4 
{ 
i\ 
I\ 
i \ 
-— 
25 MALE --——--- —N=93 oe 
F EMAL Emneeemenerereem ty s 103 te 
/ 
£3 
\ 
20r ae 
. 
- ’ oe 
w / roe 
o e / 3 an | 
x ff j \ \ ‘\ 
S'Sr ! ee Ft 
wl ' \ / \ 
A 
a ; / 
° A . eo: \ 
10F A / ; \ ‘ 
° , ee? / ¥ 1 
io =, | a | 
w i \ H VW 
* s- ; i | 
Pd v's j 4 
Po / ~~ u 
a il / mn \ 
ay aia, Pe i an a a elias a i 
140 i50 160 170 180 ig90 200 210 


INCLUDED ANGLE OF VISUAL FIELD IN DEGREES 


Fic. 5. Frequency distribution of the total angular field for males and females. 


between the included angle as measured by-the perimeter and the various 
visual skills measured by the Ortho-Rater. Table 4 shows the correla- 
tions between the near and far acuity of tach eye with its visual field. 


Table 3 


Correlations of Ortho-Rater Tests with Perimeter Tests of Included Angle 


r S.E., 
Far Vertical Phoria 00 .071 
Far Lateral Phoria —.01 071 
Far Acuity—Both Eyes 12 .069 
Far Acuity—Right Eye .24 .066 
Far Acuity—Left Eye 16 .069 
Far Acuity—Worse Eye 26 .066 
Depth Perception 17 .068 
Color Discrimination 05 .070 
Near Acuity—Both Eyes 09 069 
Near Acuity—Right Eye 12 069 
Near Acuity—Left Eye 13 069 
Near Acuity—Worse Eye 13 .069 
Near Vertical Phoria 01 .071 


Near Lateral Phoria 09 .069 





a earn aE NE 


352 John Allan McClure 





Table 4 


Correlation of Ortho-Rater Left and Right Eye Acuity Tests with 
Perimeter Tests of Each Field 








r S.E., 
Far Acuity—Left Eye with Left Field 18 .068 
Near Acuity—Left Eye with Left Field 13 .069 
Far Acuity—Right Eye with Right Field .24 .066 
Near Acuity—Right Eye with Right Field ll .069 








These obtained correlations and their standard errors indicate vision is 
not significantly related to these other visual skills, except, possibly, far 
acuity, and even in this case the relationship is very low. 

The coefficient of correlation between age and peripheral vision was 
found to be .06, with an S.E. of .070. 

Comparison with Other Investigations. Low (3) found no appreciable 
correlations with age, sex, central acuity, or color vision. He concludes 
that the correlation of .39 which he found between central acuity and 
peripheral field has no practical predictive value. This conclusion is in 
accord with the findings of the present study. 


Summary and Conclusion 


The perimeter described is adaptable for industrial and laboratory 
testing of peripheral vision limits because it tests rapidly and objectively, 
has satisfactory reliability, and can be operated successfully by an 
examiner without clinical training. The subject cannot falsify his re- 
sponses because he must fixate both eyes on the center fixation target and 
the examiner, by controlling test object lamps flashes, can determine 
when the subject’s responses are wrong. 

The research has revealed a rather wide range of individual differences 
in the extent of the visual fields. 

Peripheral vision, as measured by this instrument, has been found to 
be relatively independent of the visual skills of acuity, vertical and 
lateral phoria, depth perception and color discrimination. 


Received April 27, 1946. 


References 


1. Burnham, R. 8. A Perihemisphere for visual measurements. J. exper. Psychol., 
1940, 27, 333-336. 

2. Foerster, R. Vorzeigung des’ Perimeter. Elimische. Monatsblitter fiir Augenheil 
kund, Stuttgart, 1869, 7, 411-422. 

3. Low, F. N. Studies on peripheral visual acuity. Science, 1943, 97, 586-587. 











ao 


New Type Test of Peripheral Vision 353 


. Mayer, L. H. Light stimuli of minimal durations as a means of perimetry. Arch. 


Ophthal., Chicago, 1935, 14, 541-553. 


. Pascal, J. I. An improved perimeter-campimeter. Arch. Ophthal., Chicago, 1937, 


16, 103-105. 


. Purkinje, J. Beobachtungen und Verssuche zur Physiolo. der Sinne, Prag, J. G. 


Calve, 1823. 


. Thomasson, A.H. A plea for greater uniformity in methods of field taking. Arch. 


Ophthal., Chicago, 1934, 12, 21-32. 


. Traquair, H. M. An introduction to clinical perimetry. London: Henry Kimpton, 


Publisher, 1942. 


. Von Graefe, A. Archiv fiir Ophthal., Berlin, 1855. 
. Young, T. Mechanism of the eye. Royal Society of London, Philos. Trans., Lon- 


don, 1801. 











Statistical Laboratory for Vision Tests at Purdue- University 


S. Edgar Wirt 
Division of Applied Psychology, Purdue University 


Employee tests of vision in a number of industrial plants are being 
tabulated and analyzed in a statistical laboratory in the Division of 
Applied Psychology at Purdue University. This Occupational Research 
Center uses modern electric punched card tabulating equipment to per- 
form in a matter of minutes various types of statistical analyses that 
would require hours or days by other methods. 

Vision tests are given to employees in these different plants by persons 
who have attended an intensive two-weeks training course, the Industrial 
Vision Institute, at Purdue. The test scores, marked on a self-scoring 
record form, along with other pertinent personnel data, are sent to Purdue. 
Here the data are transferred to punched cards on a machine that is 
operated like a typewriter. One punched card contains a complete 
transcript of the record of one employee, including his name, number, 
department, job, age, experience, vision test scores, etc. This trans- 
cript is in the form of holes punched into the card, and also in printed 
letters and numbers along the edge of the card. These cards for each 
job or department are tabulated, analyzed, and reported to the company 
as routine work of this Occupational Research Center. 


Scattergrams 


One of the most frequent types of statistical analysis performed in 
this laboratory involves the preparation of a series of scattergrams, 
plotting in turn each of 14 different measures of vision against a measure 
of success in job performance. Sometimes there are several different 
measures of job success (such as rate of production, earnings, quality of 
work, absences, merit ratings, accidents, etc.) each of which may be 
plotted against each of the vision tests. Each of these scattergrams is 
tabulated and automatically printed, line by line, completely in less than 
one minute for a hundred cases. Column headings are pre-printed on a 
special paper form. The tabulating machine does the rest—printing 
automatically on each row of the scattergram the vertical or Y category, 
the cell frequencies, total frequency, and sum of all X scores for that Y 
category. One scattergram is completed by running the cards once 
through the tabulator. A grand total, summing the values in each 
354 


Statistical Laboratory for Vision Tests 355 





column, requires a second run of the cards through the tabulator—again 
less than a minute for a hundred cases. 

These scattergrams are the basis for the major work of the statistical 
laboratory, which is to evaluate the relations between visual requirements 
and job success for a particular job. The degree of relationship, as would 
be indicated by a coefficient of correlation, is not the most practical 
statement of this relationship. Instead it is necessary to determine for 
each vision test a critical score that may be recommended as a minimum 
or optimum for placement of an employee on a particular job. This can 
be done only on the basis of a scattergram. If a test is related to job 
performance, the better workmen will tend-to fall predominately in one 
part of the range of the test while the poorer workmen tend to fall pre- 
dominately in another part of the range.! The recommended critical 
score or “cut-off point” in the test range must help significantly and 


practically in differentiating between better and poorer workmen on the 
job. 


Statistics for Large Groups 


Another type of statistical analysis performed in the laboratory is the 
tabulation of vision test statistics based on large groups, including sub- 
jects on different jobs; in different plants, or in different communities. 
The purpose of such large scale tabulations is to establish norms on the 
tests for different groups and to compare frequency distributions and 
group statistics among different groups. This requires frequency dis- 
tributions and scattergrams with large numbers of cases, classified by 
age, sex, job, length of experience on the job, section of the country, and 
so on. 

The capacity of the tabulator for such large scale studies is tremend- 
ous. With a maximum of 17 categories in the variable plotted horizon- 
tally, the scattergram described above can show a frequency up to 10,000 
in any one cell, and up to 100,000 in any row. With a maximum of 24 
categories in the horizontal variable it can show a frequency up to 1,000 
in any one cell and up to 10,000 in any row. This is in addition to the Y 
category designation and sum of scores in each row. 


Correlations 


A third type of statistical analysis performed in the bureau is the 
computation of correlation coefficients, particularly intercorrelations for 
multiple correlation of factor analysis. The coefficients are obtained by 


1 Tiffin, Joseph and Wirt, 8. Edgar, Determining visual standards for industrial jobs 
by statistical methods. Trans. Amer. Acad. Ophthal. and Otolar., 1945, Nov.—Dec., 
72-93. 








356 S. Edgar Wirt 





a tabulating process that yields directly the constants for computing the 
coefficients of correlations without going through the process of preparing 
scattergrams.? The constants obtained are those necessary to solve the 
formula: 


ae N(2XY) — (2X) (ZY) 
VN (2X2) — (2X)?VN(ZY?2) — (ZY)? 


which involves only raw scores for each intercorrelation. Intercorrela- 
tions between n variables required that the data cards be put through the 
tabulator n times, at a speed of 150 cards per minute. (For a 2-digit 
of 3-digit variable the cards may have to be put through two or three times 
instead of only once.) The number of variables and of cases that can be 
handled at one time is limited only by the size of the sums of the separate 
variables. The number of digits in sums of all the variables can not 
exceed 80. This permits 20 variables with sums of four digits each, 13 
variables with sums of six digits each, and so on. For greater numbers 
the variables must be divided into two sections and intercorrelations 
plotted separately for sections aa, ab, and bb. 

N (number of cases) and the = (sum) for each variable are read 
directly from the tabulated report. Each >? (sum of squared scores) 
and =XY (sum of cross products) is obtained by summing on an adding 
machine a column of figures produced in the tabulated report. The only 
remaining chore is to substitute in the formula and calculate the coeffici- 
ents of correlation. 








Item Analysis 


A fourth type of statistical tabulation by machine is item analysis. 
In a battery of yes-no or multiple choice questions, each question must 
correlate with a criterion: This criterion may be the total score on the 
battery of questions, in which case the item must correlate with the total. 
Or the criterion may be some measure of success in another endeavor, in 
which case the item must contribute towards a prediction of success on 
this other measure. There are various short-cut methods * for determin- 
ing the value or effectiveness of an item in a battery of test questions, 
but each method is based on a frequency count of each possible answer on 
each item. 

A modification of the tabulator method for producing seattergrams 
makes it possible to produce frequency distributions of all possible an- 


* Warren, Richard, and Mendenhall, Robert M. The Mendenhall-Warren-Hollerith 
correlation method. New York, Columbia University, 1929. 

* Lawshe, C. H., A nomograph for estimating the validity of test items. J. appl. 
Psychol., 1942, 26, 846-849. 





Statistical Laboratory for Vision Tests 357 


swers simultaneously for 12 dual-choice questions, 8 three-choice ques- 
tions, 6 four-choice questions, and so on—by putting a set of cards once 
through the tabulator. A different method provides an item count 
simultaneously on forty questions (25 questions for frequencies of 100 or 
more) by putting the cards three times through the machine for a battery 
of three-choice answers, five times for five-choice answers, and so on. 


Listing 


Another use of the tabulating machine is to list, or transcribe, all or 
part of the data punched on cards. (1) All cards in each set of data may 
be completely transcribed, a line for a card, as a permanent file record; 
any card that should become damaged or lost can be reproduced from this 
record. (2) Suitable descriptive headings are punched in cards and listed 
automatically at the beginning of each tabulated report. (3) The report 
to a cooperating company includes a list, produced on the tabulator, of 
names of employees whose visual skills do not meet the minimum re- 
quirements for the job—requirements that have been determined on a 
factual basis by statistical analysis. The company may notify these 
employees individually concerning their handicap and refer them to eye 
doctors for visual care, which in most instances can help them to meet 
requirements for the job. 


The Laboratory 


The work of the statistical laboratory is largely routine. Sets of 
data come in, are processed, and reported back to the companies that 
collected the data. Special projects are fitted into this routine schedule. 
Special projects may be special studies requested in connection with the 
vision tests, other studies on personnel tests of various types, studies on 
merit rating, job evaluation, research studies, and so on. The present 
volume of such statistical work requires a staff of two psychologists, two 
tabulating machine technicians, and a secretary. 

The equipment includes calculators, files for data and punched cards, 
and the following International Business Machines tabulating equipment: 


1 Alphabetic Accounting Machine with 80 counters, 88 print bars, 
2 digit selectors, progressive totals, 20 comparing relays, 7 plug- 
boards, and a speed of 80/150 cards per minute. 

Counting Sorter with zone and digit selectors, and a speed of 400 
cards per minute. 

Alphabetic Printing Punch 

Alphabetic Verifier 

Reproducing Punch, with gang punch, 80 columns of comparison. 





358 S. Edgar Wirt 





Pending delivery of the last two items, the Printing Punch has served 
as reproducing punch and gang punch, The Tabulator has served as 
verifier, the Counting Sorter has helped out on tabulation. 

This statistical laboratory, with respect to equipment, personnel, and 
operating cost, is subsidized by the Bausch & Lomb Optical Company as 
part of a cooperative research project on occupational vision. It was 
evolved over a period of several years and has been doing routine research 
on vision in industry since July 1944. Another part of this project was 
the development by Bausch and Lomb of the Ortho-Rater, a battery of 
standardized precision vision tests validated for industry by Purdue 
University. These tests are produced by Bausch and Lomb. It is 
these standardized tests used in industry that are the basis for all the 
routine and some of the special statistical work of the laboratory. A 
third part of the project is the Industrial Vision Institute, an intensive 
two-weeks course at Purdue, given several times a year for representatives 
of industries that are using the Ortho-Rater and the Purdue Occupational 
Research Center in their own vision programs. 

This method of research on occupational vision was developed in 
three stages. First, staff research men at Purdue developed this research 
approach to industrial problems of vision, made studies in various plants, 
and reported them back to the management. Second, representatives of 
industry were taught this procedure, with which they made studies in 
their own plants, using semi-mechanical methods of card tabulation. 
Third, the Occupational Research Center was developed to analyze data 
that has been gathered in these plants by management personnel. This 
has resulted in a very large and growing collection of research data on 
occupational vision at Purdue University. 


Received May 22, 1946. 





Motor Performance of Normal Young Men Maintained on 
Restricted Intakes of Vitamin B Complex * 


Josef Brozek, Harold Guetzkow, Olaf Mickelsen, and Ancel Keys 
The Laboratory of Physiological Hygiene, University of Minnesota 


Clinical accounts of vitamin B-complex deficiencies emphasize general 
weakness, incoordination, and other indications of neuro-muscular 
deterioration. Such symptoms are not specific to vitamin deficiencies 
and are difficult to evaluate objectively in either clinical or survey studies. 

In controlled experiments quantitative description of voluntary 
muscular performance is possible and has been utilized in several investi- 
gations in this Laboratory (14, 15, 16). Reliance can be placed on the 
results of psychomotor tests provided that (1) the methods and their 
applications are rigorously standardized, (2) stability of performance in 
the control (pre-experimental) period is secured by adequate training, 
(3) possible additional practice effects in the experimental group are 
accounted for by the use of a strictly comparable control group, and (4) 
all hints concerning the subjects’ nutritional status or suggestions of 
symptoms are scrupulously avoided. 

The B vitamins can be regarded as a group not only because of some 
common physical properties but because their natural distribution in 
diets frequently leads to rather parallel degrees of adequacy or deficiency 
for the several components of the complex. The work in this Laboratory 
has been devoted principally to thiamine, riboflavin, and niacin. Each 
of these enters into fundamental enzyme reactions of muscle and nerve 
metabolism (10). Restricted intakes of these B-complex vitamins might 


* The work described in this paper was done under a contract, recommended by the 
Committee on Medical Research, between the Office of Scientific Research and Develop- 
ment and the Regents of the University of Minnesota. Important financial assistance 
was also provided by the Nutrition Foundation, Inc., the U. 8. Cane Sugar Refiners’ 
Association, N. Y., the Corn Industries Research Foundation, N. Y., Swift and Co., 
Chicago, the National Confectioners’ Association, Chicago, the National Dairy Council, 
Chicago, and the Graduate Medical Research Fund, University of Minnesota. Merck 
and Co., Inc., provided a generous supply of pure vitamins. Most of the food materials 
were supplied by the Subsistence Branch, Office of the Quartermaster General, U. S. 
Army. We appreciate the constant assistance of the members of the Laboratory staff, 
particularly Dr. Austin Henschel, Dr. Henry Longstreet Taylor, and Miss Angie Mae 
Sturgeon. Dr. Howard Alexander, assisted by Messrs. Norris Schulz and Ralph 
Michener, handled the statistical analyses. Mr. Ersal Kindel constructed the test 
equipment. 


359 








360 J. Brozek, H. Guetzkow, O. Mickelsen, and A. Keys 


act as limiting factors in human motor performance even at levels well 
above obvious clinical deficiencies. 

This is a report on the psychomotor performance of normal young 
men maintained on diets restricted in the B vitamins but otherwise 
adequate. Thiamine, riboflavin, and niacin were regularly checked by 
analysis of the diet. The other members of the B-complex are assumed 
to have been supplied in parallel amounts since the diet was composed of 
varied natural foodstuffs. The general experimental program and pro- 
cedures have been described in greater detail elsewhere (17). 


Experimental Program 


The subjects in this experiment were eight men, 20 to 32 years of age. 
They were emotionally and physically normal and free from signs or 
history of nutritional abnormalities which might have affected their 
vitamin requirements or ability to do moderately hard physical work. 
Before they served as volunteer subjects in this experiment the men had 
been for some months in Civilian Public Service Camps for conscientious 
objectors. In so far as could be ascertained the camp diets did not ex- 
ceed the recommendations of the National Research Council (19). The 
men were under supervision throughout the experiment and no food other 
than that provided in the controlled diet was permitted at any time. 

The experiment consisted of four consecutive parts: 


I) The “standardization” period of 41 days, including a period of 25 days 
of hard physical work reported previously (16). 

II) The prolonged “partial restriction” period of 161 days on a diet 
providing from one-third to two-thirds of the B-vitamin allow- 
ances recommended by the National Research Council (19). 

III) The “acute deficiency” period of 23 days on a diet almost devoid of 
B vitamins. 

IV) The “thiamine supplementation” period of 10 days in which the 
“acute deficiency” diet of the experimental subjects was abun- 
dantly supplemented with thiamine. The intakes of riboflavin 
and niacin were maintained on the same level as in the “acute 
deficiency.” 


During the standardization period the subjects were maintained on a 
uniform diet and followed an identical physical exercise schedule. In the 
psychomotor tests standardization involved training the subjects to a 
high and consistent level of performance. 

Direct analyses of samples of meals eaten during the partial restriction 
period showed average intakes, per 1,000 cal., of 0.185 mg. of thiamine, 
0.287 mg. of riboflavin, and 3.71 mg. of niacin. The four men who 





§ 
": 
= 
s 
> 
3 
F 
S 
= 
> 
5 
: 
a 
3 
~ 


*(‘Sq] PST) F4BIOM OBer9av jo (‘TBH O0O0'E) Boul oaHow 
A[oyBIOPOUl 1OJ (G67) [IOUNOD YorveseyY [eUONVN 94} JO prvog UONWYNY pu’ poo oy} Aq popuouru0Des 10M S}UNOUTE A[TEp Bs0T,J, ¢ 


“UlNBIU “BUI gD 


puv ‘uLAvpoqu “Zul ¢’Q ‘ourmerq} “Bu gg Ajoyeurrxoidds Zuruareyuoo ysvoh poup jo “13 Z[ snd (uv “Bu OT ‘ULAspoqu “Bur | ‘oururerngy 
“Sul [) surmmez1A oOeyjuAs Ajrep peareoos dnoiZ jor}U00 ey} ,,uoNeyuouarddns ourareryy,, pus , Aousoyep o4nov,, jo powed oy} UT, 
"Yip “90q UO yUoUTLIEdxe oY} WOIy poddorp sem puB UOMesuI A10;811ds04 1oddn us pedojsaep er yolqng ¢ 


pus ‘UIAvpoqu “Zur [ ‘oururerq} “Zur | poureyuo0o 47 
(pI) qxoder oyeredes B UI peqLIosep useq sBYy UONOLIYse1 [eed Zurpoosid usuIGer AreyoIp OY], + 


‘uyowra “Bur OT 


“urd 00:9 98 ‘Aousyep jo Avp pigz oy} ‘2/Z] WO UOAIZ SBM OSOP 4S0}-UOT]BINYSE ; 





Tea Daa OGCm A 


P 


cao 


ost 
801 
00°0 
00 
8Or 
00°0 
0F'0 
sO! 
or0 
0OT 
S It 


cvuTea TDA tsdGan Ba 


0@ 
a | 
00°0 
soo 
eT 
00°0 
¢0'0 
oT 
¢0'0 
00°T 
88°0 


oT 
oT 
00° 
£0°0 
oT 
00°0T 
£0°0 
oT 
£0°0 
00°T 
6¢°0 


quowerddng 
quewerddng 


yd 
queweddng 
queusjddng 
yd 

» quowsddng 
pid 
quewuesddng 
yd 


LU/21-e/2t 


Z1/Z1-8/Z1 


sL/ZI-S1/T1 


PI/1I-L/9 


¢ SOOUBMOTTY AIBIOIC] pepusuIUIOIEY “OH “YN 


uorne}uemelddns oururety,y, 


AnuaMyep any 


UOHOLySar [BIE 





8 


L 


BM 


TAA UIDBIN 





syoefqng 


wuAtg 
“oqrd 


ouTul 


BL, 


somnog 


88 


uourIsey [BUONIINN 








oqeooujd = d “uoweddns = s ‘yorp = p 
;Sunoy FZ 40d “Bur ut ‘eywyuy urueyA-g :quoulLIedxy] 94} Jo UBiseq7 


T 9198. 





362 J. Brozek, H. Guetzkow, O. Mickelsen, and A. Keys 


served as experimental subjects received placebos, while the four control 
subjects were given supplements as indicated in Table 1. Each subject 
received additional daily supplements of 25 mg. of pyridoxine, 25 mg. of 
ascorbic acid, 5,000 I.U. of vitamin A, and 170 1.U. of vitamin D. The 
energy expenditure during this period was about 3,300 Cal. perday. The 
physical work on a motor-driven treadmill and the testing program were 
rigidly standardized. In the restricted group the average body weight, 
taken nude and before breakfast, was 141.8 lbs. during the standardiza- 
tion period and 140.5 lbs. at the end of partial restriction. The average 
weight of the supplemented subjects was 148.3 Ibs. before the start of the 
experiment and 145.5 at the end of the partial restriction. 

During the period of acute deficiency all men were fed a synthetic diet 
composed largely of cornstarch, sugar, vegetable shortening, and purified 
casein, to which minerals and vitamins were added to produce a “bal- 
anced”’ diet except for absence of the B complex. A pair of men from 
each of the two groups used in the partial restriction was supplemented 
during the acute deficiency. The distribution of subjects into restricted 
and supplemented groups is summarized in Table 2. 


Table 2 
Distribution of Subjects in the Different Phases of this Experiment 











Status in Partial Status in Acute 
Subjects Restriction Deficiency 
G, Wi Restricted (R) Deficient (RD) 
8, Ja Supplemented (8S) Deficient (SD) 
Wa, T Restricted (R) Supplemented (RS) 
N, Jo Supplemented (S) Supplemented (SS) 





For the first two weeks of the acute deficiency, all subjects were able 
to do hard physical work which increased their caloric expenditure to 
about 4,000 Cal. per day. During the third week, two resiricted- 
deficient subjects (RD) were unable to continue their treadmill work. 
One supplemented-deficient subject (SD) who became ill with an upper 
respiratory infection was dropped from the experiment. The three 
experimental subjects weighed, on the average, 128 Ibs. at the start and 
121 Ibs. at the end of the acute deficiency. The average weights of the 
supplemented group were 152 Ibs. and 153 lIbs., respectively. 


Methods 


In most cases, the actual performance of a motor task is the best 
method we know for estimating the potential performance, the ‘‘work 
capacity,” in that task, Performance in many types of work, particu- 





Silttesraneneaae ects: 





Motor Performance of Normal Young Men 363 


larly when the component of coordination is prominently involved, can 
be predicted more accurtaely from a tryout performance than on the 
basis of measurable neurological, muscular, and other physiological or 
biochemical characteristics. 





Fic. 1. A general view of the treadmill and of the equipment for the tests of 
speed and coordination. 


The psychomotor battery used in this experiment included tests of 
strength, speed, and coordination. Standard dynamometers were used 
for measuring strength. In the test of speed of tapping a stylus was used 
to strike alternately two plates separated by a small barrier; the number 
of taps in the first and last 10 seconds of a half-minute tapping period was 





364 J. Brozek, H. Guetzkow, O. Mickelsen, and A. Keys 


recorded by an impulse counter. In the test of speed of gross body re- 
action the subject, while walking on the treadmill, was required to turn 
off the lighted one of three bulbs by bending over and striking the proper 
key. The keys were placed 18 inches from the fioor of the treadmill. 
Reaction-time score is the average time of fifty reactions. The pattern 
tracing test involved eye-hand coordination; the speed of tracing the 





Fic. 2. A subject performing the test of gross body reaction time. 


pattern was kept constant and the scores consist of the number of contacts 
between the stylus and the side of the pattern, and of the total duration 
of these contacts. The ball-pipe test measured the speed cf forearm and 
hand-movements involved in dropping ball-bearings through a one-foot 
conduit pipe (5). ‘ 

All of these tests, with the exception of the two dynamometers, were. 
performed while the subject was walking on a treadmill. In clinical B- 
complex deficiencies the peripheral neuropathy is reported to affect first 





a ne EN 


Motor Performance of Normal Young Men 365 
gq 





the lower extremities. Because of the possible development during 
acute deficiency of difficulties in walking some psychomotor. measure- 
ments that did not involve walking were needed. Therefore, two addi- 
tional tests which could be taken in a seated position were applied. In 
the test of toe reaction-time the subject reacted to auditory stimuli by 
flexing the big toe of the left foot and lightly pressing against a wooden 
board which stopped the timer. The Minnesota Rate of Manipulation 
test (28) was used to measure the speed of finger movements. The 
manipulation and toe reaction tests were applied only during the acute 
deficiency. 





Fic. 3. A close-up of the pattern tracing board and the two tapping plates. The 


lights provide the “ready,” “go,”’ and “stop” signals for the tapping test. 

Active cooperation on the part of the subjects is one of the necessary 
prerequisites for the valid use of psychomotor tests. This requirement 
taxes the skill of the experimenter in maintaining optimal motivation; 
this is particularly true when the testing sessions are spread out over a 
period of months. Data obtained in this and other experiments (15) 
demonstrate that a relatively constant, high level of motivation can be 
achieved. 

In designing an experiment which would permit an evaluation of 
possible psychomotor deterioration, it is advantageous to bring perform- 
ance in each test to a practice plateau before the start of the experimental 
period. Performance deterioration can then be measured without 








366 J. Brozek, H. Guetzkow, O. Mickelsen, and A. Keys 


practice effects entering as a complicating factor. During the standardi- 
zation period our subjects were given 30 practice trials on the psycho- 
motor tests used throughout this experiment. 

The performance fluctuations from trial to trial in a number of the 
psychomotor tests are no larger than the variability of many standard 
physiological measurements. The percentage individual and group 
variabilities were computed for five plateau trials of the standardization 
period (Table 3). 


8 SS TET OER 


Table 3 


Trial-to-trial Fluctuation of Psychomotor Performance in the Standardization 
Period After Two Weeks of Intensive Practice * 














Individual Grou 
Test Variability Variability 

1. Strength, hand-grip 4.0% 1.6% 
2. Strength, back-lift 11.7% 2.2% 
3. Speed, initial tapping 5.8% 0.9% 
4. Speed, terminal tapping 4.3% 1.1% 
5. Speed, gross body 

reaction time 5.3% 0.9% 
6. Speed and coordination, 

ball-pipe 4.1% 1.0% 
7. Coordination, pattern-tracing 

time of errors 17.0% 6.7% 
8. Coordination, pattern-tracing 


number of errors 13.2% §.5% 





* The individual and group trial-to-trial variabilities were calculated according to 
the following formulae: 











Individual variability = aH 


i— 


t=5 

> (M — Mm) 

a Oe 
4 


Both measures of variability are expressed as percentages of M. The symbols are 
defined as follows: 


Group variability 


mean of the eight individual scores for a trial 
mean of the five trial-means 


x = score of an individual 
z = mean of his five scores obtained in trials on five successive days 
t = trials 
i = individuals 
M= 
M 


a 





Motor Performance of Normal Young Men 367 


During the period of prolonged partial restriction the psychomotor 
functions were measured at intervals of two to three weeks. In the sub- 
sequent period of acute deficiency the psychomotor tests were given at 
weekly intervals. All measurements were made in duplicate. 

At the end of partial restriction some equipment changes were nec- 
essary. For example, in the pattern-tracing test the tip of the stylus, 
which had been flattened out by frequent use, had to be replaced by a 
more rounded one; this made the test more difficult. The line starter 
for the reaction-time test was rebuilt so that the time followed more 
quickly the onset of the electrical light stimuli; this had the effect of 
lengthening the reaction-time. Because of such changes, the two experi- 
mental periods will be evaluated separately. 


Results 


The control scores obtained at the start of the prolonged partial re- 
striction period are given in Table 4. The data for partial restriction are 


Table 4 
Control Scores at the Start of the Period of Partial Restriction 
Note: Tests and units are as follows: 1) Hand-grip, in kg. 2) Back-lift, in kg. 
3) and 4) Tapping, number of taps in the initial and terminal ten seconds of a half- 
minute work period. 5) Gross body reaction-time, in 1/120 sec. 6) Ball-pipe, number 
of passages of the ball through the conduit pipe in one minute. 7) Pattern tracing, 
duration of contact errors, in 1/120 sec. 8) Pattern tracing, number of contact errors. 








Group Restricted Supplemented 





Subjects Subjects 
Test G Wi Wa T Mean N’' S' Jo Ja Mean 











. Strength, hand-grip 47 438 63 66 55 63 58 53 57 58 
. Strength, back-lift — 122 190 156 231 146 171 156 176 
. Speed, initial tapping 62 59 70 68 67 63 69 63 61 64 
. Speed, terminal tapping 61 64 6 60 62 57 6 57 53 58 
. Speed, gross body 

reaction time 400 488 4 48 4% #2 41 4 4 43 
. Speed and coordination, 

ball-pipe 72 8 75 69 74 5&3 6 77 69 65 
. Coordination, pattern-tracing 

time of errors 226 152 95 173 162 183 176 198 239 199 
. Coordination, pattern-tracing 

number of errors 40 29 20 34 31 33 37 38 44 38 





summarized in Table 5. The data obtained in acute deficiency, except 


for toe reaction-time and the manipulation test, are presented in Tables 
6 and 7. 





‘sOAg] YT pus %¢ u99M40q GOUBOGTUBIS So} BOIPUl FS194SV O[BUIG , 
"IL'S St [oA9] YT OM} SeH'S St 2 JO [Ao] %E oy} Uopeauy Jo SoeIFEp Y TIAA 








s10119 ¥0 Joquina 

















S Zupes-u19}38d 
Ms 8 =«-« ONT ¢'9I- gI- tI-— Iw eI- ZZzI- «6 6- i £Z— ‘UONBUIPIOND “gs 
: 810.119 JO QUIT} 
< Su1es}-010)10d 
2 wit wi—- sti- w— sII- wI- SzZ8- OL- OF- 89-  8seI—- ‘UONBUIPI00D *Z 
= edid-yeq ‘uoneu 
= 06% CL I ZI Or L ara ¢ Z- z P -Ipi00o puw peedg ‘9 
— oul} UOKjOver 
3 PT 06 SI Zz g II 09 or g 9 g Apoq ssou3 ‘poodg “¢ 
S Burddey pura} 
: LIT 8% 0 L $- L 80 z I z t- ‘peedg “fF 
° Surddey penrat 
J 20% SL g ZI Zz ZI Ze zZ g g g ‘peedg “g 
— qyq-yoeq 
3 10°0 z'0 Z- ¢ 6- l L0-~—s 6 oe-— 6 _ ‘q)3uaNng “Z 
& du3-puvy 
ne 1% ST I Or €- l- 09 ¢ g 6 g ‘qy3uenNg *T 
sO uve wf of § N wey OL J, a, 8 5 480, 
S syoofqng syoatqng 
aa) en 
re pousmelddng po7voL1ys9y dnoiy 











UOTPOLIISOY [VIAVg JO poueg oy} Ul souvULIOjIeg (ABP PIEG]) [BUTUIET, 03 [OTJUOD WO SoIOOg UI sesuBYyD 


g Aq", 


368 





Motor Performance of Normal Young Men 369 


In general, there was only a slight difference in the performance of the 
experimental and the control group in the period of partial restriction. 
In two of the eight psychomotor tests continued practice produced larger 
improvement in the scores of the controls. This possible suggestion of 
slight impairment in the experimental group had its counterpart in the 
marginal disturbance of carbohydrate metabolism, reflected in a small 


Table 6 
Control Scores at the Beginning (4th day) of Acute Deficiency 








Group Deficient Supplemented 
Subgroup RD RD SD SD RS RS SsS_ SS 








Subjects Subjects 
Test G Wi Ss (Ja)* Mean Wa T N 











. Strength, 51 56 (58) 55 65 7 63 
hand-grip 

. Strength, -- (158) 138 153 239 
back-lift 

. Speed, 61 (62) 67 71 65 
initial tapping 

. Speed, 57 (50) 59 61 
terminal tapping 

. Speed, grossbody 57 (87) 61 77 
reaction time 

. Speed and 73 (67) 78 59 
coordination, 
ball-pipe 

. Coordination, 126 (203) 150 
pattern-tracing 
time of errors 

. Coordination, 21 (34) 26 
pattern-tracing 
number of errors 





* Ja was dropped from the experiment before its completion because of a respiratory 
infection. 


increase in the blood pyruvate level (17) . It also paralleled small changes 
obtained in the Rorschach. 

During acute deficiency the psychomotor performance deteriorated 
markedly in the experimental group. These functions were among those 
aspects of fitness which showed an early deterioration, immediately 
following the gastrointestinal disturbances. The psychomotor tests 
were more sensitive to the “stress” of vitamin deficiency than many of 
the metabolic, neurological, and cardiovascular tests (17). 





370 J. Brozek, H. Guetzkow, O. Mickelsen, and A. Keys 


Terminal supplementation with thiamine alone for 10 days led to 
recovery in the tests of speed and of coordination. The small decrease in 
grip strength which appeared in the last week of deficiency was still 
present at the 10th day of supplementation. 

These results will be discussed in detail for each test separately. 


Table 7 


Changes in Scores from Initial (4th day) to Terminal (23rd day) Performance in 
Period of Acute Deficiency 





























Group Deficient Supplemented 
Subgroup RD RD 8D RS RS S8sS_ 8S 
Subjects Subjects 
Test G Wi 8S Mean Wa T N Jo Mean t-test 
1. Strength, —-2 -3 -3 -—2.7 4 -2 0 3 1.2 2.37 
hand-grip 
2. Strength, — -20 21 5 -16 -—-7 -3 16 —25 0.18 
back-lift 
3. Speed, -7-12 1 —60 3 2 7 -—4 2.0 1.93 
initial tapping 
4. Speed, -8 -6 1 —43 5 1 4 —2 2.0 2.15 
terminal tapping 
5. Speed,grossbody 18 66 26 36.7 2 2-3 ii 3.0 2.61* 
reaction time 
6. Speed and. -7 -7 -9 -—7.7 2 3 -3 8 2.5 3.77* 
coordination, 
ball-pipe 
7. Coordination, 118 106 109 1110 -—32 —66 —97 —49 —61.0 10.31** 
pattern-tracing 
time of errors : 
8. Coordination, 24 11 #138 = «160 —2 —12 -—17 —12 —108 5.32** 
pattern-tracing 
number of errors 





With 5 degrees of freedom, the 5% level of ¢ is 2.57; the 1% level is 4.03. 
* Single asterisk indicates significance between 5% and 1% levels. 
** Double asterisks indicate significance at better than 1% level. 


Strength. Simple strength, measured by hand-grip and back-lift 
dynamometers, was in general remarkably stable. During the pro- 
longed period of partial restriction there was no deterioration in either 
group; in fact the restricted group showed a slight but statistically not 
significant increase in hand grip. In the acute deficiency period the 
experimental group exhibited a very slight decrease in grip strength. 

Speed: Finger Movements. The tests of speed of finger movements 





Motor Performance of Normal Young Men 371 





was practiced at the end of the period of partial restriction but was ex- 
perimentally used only in acute deficiency. The deficient group showed 
a slight decrease in efficiency, with a score of 87.4 at the fifth day and 
84.6 on the twenty-third day of acute deficiency; the score is the number 
of discs turned over in the last minute of a five-minute work period. The 
control group continued to improve, having an initial score of 89.6 and 
the terminal score of 94.2. The difference between the two groups in 
the change from the beginning to the end of acute deficiency was not 
statistically significant, ¢ = 2.14 (¢ at 5% level = 2.57). 

Speed: Tapping. There was no loss of motor speed, as determined 
by performance on the two-plate tapping test, during prolonged partial 
restriction. There were changes in the positive direction, both groups 
slightly improving as a result of continued practice. The mean gain of 
the supplemented group was statistically significantly higher for scores 
obtained in the initial ten seconds of the half-minute work period, but 
the gains in the terminal ten second scores did not differentiate the two 
groups. Throughout the period of acute deficiency, the performance of 
the deficient subject S who was supplemented during the preceding 5 
months remained unchanged. The two restricted-deficient subjects 
exhibited a deterioration of performance in both tapping scores. 

Speed: Gross Body Reaction. The average time involved in selecting 
and striking a proper telegraph key while walking on the treadmill in- 
creased slightly during prolonged restriction in both groups. This is 
attributable to increased slippage in the brake mechanism of the timer. 
There was no significant difference between the reaction-time scores of 
the restricted and the supplemented group. In the subsequent period 
of acute deficiency the performance of the deficient group deteriorated 
markedly. The difference between the increases of the two groups in 
reaction times for the first day and the twenty-third day was statistically 
significant at the 5% level. Biologically, this increase in reaction time 
was more important than the 5% level of significance might indicate. 
In terms of percentages, the deficient group exhibited a 60% decrease in 
this aspect of fitness between the fourth and the twenty-third day of 
acute deficiency, as compared with a change of only 4% in the supple- 
mented group. 

Speed: Toe Reaction. In the other reaction-time test, the subject 
reacted with his big toe to an auditory stimulus. The toe reaction-time 
score was the average of fifty reactions. The deficient subjects averaged 
43 on the third day.and 58 on the twenty-third day of the acute deficiency 
period; the comparable mean scores for the supplemented group were 44 
and 45, all scores being given in 1/120 second. The difference between 
the mean change from siart to end of acute deficiency was significant at 








372° J. Brozek, H. Guetzkow, O. Mickelsen, and A. Keys 


better than the 5% level, t = 3.07 (t at 5% level = 2.57). The toe- 
reaction test did not appear to be more sensitive to the experimentally 
produced B-complex deficiency than the test of the gross body reaction- 
time. 

Speed and Coordination: Ball-pipe. In the ball-pipe test the scores in 
the prolonged partial restriction showed slight effects of continued 
practice; this was more pronounced in the supplemented group than in 
the restricted group. In the subsequent period of acute deficiency there 
was definite evidence of deterioration in the unsupplemented group. 
The difference between the average change in the experimental and the 
control group from the start to the end of the deficient diet approaches 
the 1% level of statistical significance. 

Coordination: Pattern Tracing. In the pattern-tracing test the scores 
continued to improve during the whole length of the prolonged partial 
restriction period. This was largely an artifact due to the gradual 
flattening of the point of the tracing stylus which made the task easier. 
Both the restricted and the supplemented group reduced the initial 
scores in approximately the same ratio. This ratio was 0.49 and 0.47 
for the time-score of the restricted and the supplemented groups respec- 
tively, and 0.61 and 0.58 for the number-score. As has been stated, a 
new stylus-point was used at the beginning of the acute deficiency. The 
supplemented group adapted to this change whereas the performance of 
the deficient group showed striking deterioration. The difference be- 
tween the experimental and the control group was statistically significant 
beyond the 1% level. 

The change from initial score ‘ terminal score was used as a measure 
of the overall effect of the experimental regimens. The t-tests presented 
in Tables 5 and 7 were based on these differences. In doing this we 
ignored the scores obtained during the period between the initial and the 
terminal testing session. Both the analysis of variance and a regression 
analysis of trends was applied to the full data. These analyses gave 
substantially the same results as the t-tests applied to the initial-terminal 
changes. 


Discussion 


In general, the problem of the relationship between vitamin intake 
and level of motor fitness comprises two rather distinct questions: per- 
formance on supplemented diets and performance on restricted diets. All 
acceptable studies of the first category are in agreement that extra vita- 
min supplementation of ordinary ‘‘good” diets, i.e., diets not considered 
really deficient, does not lead to improved muscular performance in 
relatively normal persons (8, 12, 13, 22). 





Motor Performance of Normal Young Men 373 


In disturbed or special metabolic conditions vitamin supplementation 
may be beneficial. Such a positive effect has been reported for senile 
patients whose diet was heavily supplemented with B vitamins (23). 
In the study cited it appeared that there was at least a temporary re- 
sponse in psychomotor speed, coordination, and strength, but there is a 
possibility that the vitamin supplementation only helped to improve an 
otherwise poor hospital diet. 

The question is more comp!ex with restricted diets. If the restriction 
is sufficiently severe. to produce obvious clinical deficiency, it is agreed 
that motor performance will be impaired, though the precise degree and 
nature of the impairment has not been characterized quantitatively. 
The effect of any restriction may be dependent upon many factors, such 
as previous diet, duration of the experimental dietary regimen, level of 
activity, and climate, as well as peculiarity of the subjects and the parti- 
cular functions measured. The present confused state of the vitamin 
‘“‘requirements”’ problem reflects insufficient attention to these secondary 
factors. The present paper is concerned only with the capacity for rel- 
atively brief psychomotor performance of normal young men leading a 
moderately active life in a temperate climate. There are no published 
data directly comparable to those presented here. 

Experiments carried out on animals provide little information bearing 
directly on those aspects of performance which were studied in the present 
experiment. The animal experiments deal with such aspects of “‘fitness’’ 
as prolonged work of intact animals (18), spontaneous activity (4, 9), 
and disturbance of locomotion and posture (7, 20). 

Few of the studies on man have utilized adequate techniques to 
determine the effects of vitamin restriction on psychomotor performance. 
Reports of changes in total work capacity developing during subsistence 
on restricted intakes of the B vitamins emphasize decreased endurance 
in hard work (2, 3,11). Such changes probably are dependent to a large 
extent on cardiovascular functions; in any case the reports cited provide 
little or no data on the more purely neuromuscular functions under 
present consideration. Simple clinical observations of the deterioration 
in motor behavior in non-standardized situations are valuable as clues 
for experimental work, but are difficult to evaluate (25, 26). If perform- 
ance deteriorates, and that is by no means always clear, it is uncertain to 
what extent this is a result of changes in neuromuscular functions or of 
uncontrolled changes in motivation. 

Williams and his colleagues expressed the opinion that measurements 
of performance capacity ‘‘are exceedingly difficult to make, for they 
involve not only the ability but also willingness to perform, and willing- 
ness is lost early in thiamine deficiency” (27, p. 72). It is true that a 





rr 
aah 
y 








374 J. Brozek, H. Guetzkow, O. Mickelsen, and A. Keys 


performance score cannot be accurately divided into “capacity” and 
‘‘willingness’” components. However, when performance is character- 
ized in quantitative terms and sufficient intra-individual and inter- 
individual controls are provided, this difficulty does not prevent experi- 
mental work in this area. It should be clear that, if “willingness” were 
more easily lost than capacity, we should be safe in concluding the absence 
of change in capacity when there is no change in total performance. 
This was the case in the period of partial restriction. One would expect 
that an unpleasant task, repeated month after month, would more likely 
be susceptible to changes ascribable to decreased ‘‘willingness’” than a 
task equally often repeated but well liked. Note that performance on 
the back-lift dynamometer, a ‘‘back-breaking’”’ task which was disliked 
by a majority of the subjects, did not show decrease over the prolonged 
partial restriction. There was no essential difference in this test as 
compared with the hand-grip dynamometer, although the men enjoyed 
the latter task. In the acute deficiency the performance in the majority 
of intellective tests, requiring a good deal of concentration and effort, re- 
mained unchanged; this demonstrates that ‘‘willingness’” was not funda- 
mentally affected (17). Yet, there was deterioration of psychomotor 
performance. 

The present studies represent a development and continuation of 
previous investigations in this Laboratory. The studies of 1941-42 
disclosed no deterioration in any of the functions when moderately active 
men were maintained for 10 weeks on a diet providing only 0.23 mg. of 
thiamine per 1,000 Cal. (14). A battery of psychomotor tests was 
applied in experiments involving subsistence for 152 days on a diet 
providing only 0.31 mg. of riboflavin per 1,000 Cal. Again, there was no 
functional deterioration observed (15). Finally, there was no deteriora- 
tion in normal young men maintained for 14 days at hard work (4,600- 
4,800 Cal. daily) with a B vitamin intake per 1,000 Cal. at about one- 
fourth the National Research Council Recommended Daily Allowances 
(16). 

Such results might suggest that the particular tests used in the present 
experiment are insensitive to ‘‘stress.”’ This is disproved by positive 
results in other conditions such as fasting (24) and bed rest (6). The 
changes in psychomotor performance obtained in the acute deficiency 
period of the present experiment afford further evidence that these meth- 
ods are not insensitive when dietary vitamin insufficiency is present. 

The acute phase of the present experiment was designed to produce a 
physiological B-vitamin deficiency in a relatively short time by main- 
taining the experimental subjects on a diet extremely low in thiaminel 
riboflavin and niacin, and by a simultaneous high caloric output. Anima, 


rr meena mm ty, 





Motor Performance of Normal Young Men 375 


experiments have indicated that physical exercise speeds up the onset of 
deficiency symptoms in animals receiving a diet severely restricted in 
thiamine. In Guerrant and Dutcher’s experiments this period was 36 
days for the “‘forced exercise’ group, and 44 for the ‘‘confined” group 
(9). In the acute deficiency period of the present experiment the tread- 
mill work was intensified and the total daily caloric expenditure increased 
to about 4,000 Cal. 

Studies of psychomotor performance by the techniques of factorial 
analysis indicate that motor skills are very specific; scores obtained in 
various motor tasks exhibit, on the average, positive but very low correla- 
tions (21). It is interesting that in acute deficiency the deterioration 
tended to affect all psychomotor functions. The degree of deterioration, 
however, varies. Since the number of subjects in the psychomotor tests 
was identical, the ‘‘degrees of freedom”’ are the same and the t-values for 
the different psychomotor tests are directly comparable. The relative 
sensitivity of the different tests to this stress is indicated indirectly by the 
t-value of the difference between the deficient and supplemented group. 
In this respect, deterioration in the pattern tracing test was most out- 
standing. 

A more direct approach to determine the relative deterioration of the 
various functions is to express the mean changes from initial to terminal 
performance level of the deficient group in terms of the percentage of the 
average scores at the start of acute deficiency. This method of evaluat- 
ing the deteriorative changes points in the same direction as the compari- 
son of the levels of statistical significance (Table %). However, use of 
percentages in comparing changes which occurred in various functions 
involves assumptions whose effect it is difficult to evaluate (1). 

From the biological point of view, the most meaningful approach to 
the evaluation of the magnitude of changes which occurred in acute 
deficiency would be to relate them to the total range of performance in a 
given task in the general population. This is not possible at the present 
time. It would require use of rigorously standardized test techniques 
which in turn would make possible pooling of the data obtained by the 
various laboratories. In the area of psychomotor performance this lack 
of standardization is more glaring than in most areas of psychological 
and physiological testing. 

It might be asked whether the decrease in performance in the period 
of acute deficiency of thiamine, riboflavin, and niacin was not due simply 
to partial starvation. In answering this question, two types of evidence 
should be considered. First, when Kniazuk and Molitor (18) reduced 
the food intake of their vitamin supplemented rats to that of the deficient 
group, they found no significant reduction in work performance. The 








Leas ven penne OCs 








376 J. Brozek, H. Gutezkow, O. Mickelsen, and A. Keys 


differences in the rate of recovery in our experiment also provide a good 
argument for a relative independence of this small and gradual loss of 
body weight and performance deterioration: the performance of the 
subjects in the deficient group recovered strikingly in the supplementation 
period whereas their weight increased only slightly. 

The psychomotor tests used in our experiment were all of short dura- 
tion. It is possible that more prolonged tests would have shown even 
more pronouncedly the effects of the vitamin deficiency. However, it is 
very difficult to devise satisfactory laboratory tests of work endurance 


Table 8 
Sensitivity of Psychomotor Tests to the Stress of Acute Vitamin B Deficiency 








Significance Percentage Decrement 





Test of t-test in Deficient Group 

Speed of finger movements not significant 3.4 
Strength, hand-grip not significant 4.9 
Speed, terminal tapping not significant 7.3 
Speed, initial tapping not significant 9.0 
Speed and coordination, 

ball-pipe 5% level 9.9 
Speed, toe reaction time 5% level 28.7 
Speed, gross body reaction 

time 5% level 60.2 
Coordination, pattern-tracing 

number of errors 1% level 61.5 
Coordination, pattern-tracing 

time of errors 1% level 72.0 





of humans unless we use tasks which appreciably increase the energy 
consumption or lead to exhaustion of local muscle groups in a short time. 
The lack of adequate endurance tests is particularly unfortunate when 
we are interested in the application of the research findings to industrial 
nutrition. In modern industry there are few jobs producing physical 
exhaustion fatigue. Prolonged, repetitive work of moderate intensity is 
characteristic of the overwhelming majority of industrial jobs. In recent 
years the most successful approach to the problem has been made in 
experimental aviation medicine. The use of ‘‘miniature job situations” 
in the study of endurance in pilots should be paralleled in research on 
industrial physiology by the use of miniature industrial plants. This 
would extend the validity and practical importance of strictly laboratory 
studies. 


Motor Performance of Normal Young Men 


Summary and Conclusions 


1. The relationship between intake of B vitamins, particularly 
thiamine, and psychomotor performance was studied. This was one 
aspect of a comprehensive investigation of the biochemical, physiological, 
and psychological aspects of “fitness” as related to the vitamins of the B 
complex. 

2. Eight ‘“‘normal” men, 20 to 32 years of age, served as subjects. 
They were maintained for 161 days on a diet providing, on the average, 
0.185 mg. of thiamine, 0.287 mg. of riboflavin and 3.71 mg. of niacin, 
per 1,000 Cal. The physical exercise was such that a daily intake of 
approximately 3,300 Cal. just maintained body weight. Four men re- 
ceived a daily supplement of 1.0 mg. thiamine, 1.0 mg. riboflavin and 10 
mg. niacin, while the other four received placebos. 

3. The period of partial restriction was followed by 23 days on a diet 
practically free of these vitamins. The subjects were re-grouped into the 
following four pairs: Restricted-deficient, restricted-supplemented, sup- 
plemented-deficient, supplemented-supplemented. The experiment 
ended with a 10 day period of thiamine supplementation. 

4. In the period of partial restriction there was no actual deterioration 
in any of the psychomotor measurements, including two strength tests 
(hand-grip and back-lift), speed of small hand movements (tapping), 
gross body reaction time, manual speed-and-coordination (ball-pipe test), 
and precise coordination (pattern tracing). However, the scores in the 
initial 10 sec. of tapping and in the ball-pipe test showed larger practice 
increments in the supplemented than in the restricted group; the differ- 
ence was small but statistically significant at the 5% level. 

5. In the period of acute deficiency there was evidence of marked 
deterioration. The difference between the experimental and the control 
group was significant at the 5% level in the tests of gross body reaction- 
time, toe reaction-time, and ball-pipe, and at the 1% level in pattern- 
tracing. The decrease in two-plate tapping and in the test of the speed 
of finger movements was not statistically significant. There was a very 
slight decrease in grip strength, approaching the 5% level of significance. 
The measured deterioration of psychomotor performance, present as 
soon as the 11th day, was one of the early symptoms of deficiency. 

6. Performance tended to return to a normal level after 10 days of 
liberal supplementation of the experimental diet with synthetic thiamine. 


Received July 12, 1946. 


References 
1. Anastasi, A. Practice and variability. Psychol. Monogr., 1934, 45, No. 204. 
2. Archdeacon, J. W., and Murlin, J.R. The effect of thiamine depletion and restora- 
tion on muscular efficiency and endurance. J. Nutrition, 1944, 28, 241-254. 














378 


3. 


10. 


11. 


12. 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


21. 


22. 


23. 





J. Brozek, H. Guetzkow, O. Mickelsen, and A. Keys 


Barborka, C. J., Foltz, E. E., and Ivy, A. C. Relationship between vitamin B 
complex intake and work output in trained subjects. J.A.M.A., 1943, 122, 
717-720. 


. Bloomfield, A. L., and Tainter, M. L. The effects of vitamin B deprivation on 


spontaneous activity of the rat. J. lab. clin. Med., 1943, 28, 1680-1690. 


. Brozek, J. A new group test of manual skill. J. gen. Psychol., 1944, 31, 125-128. 
. Brozek, J., Guetzkow, H., and Keys, A. Changes in ‘psychomotor performance in 


bed-rest. Fed. Proc. Amer. Soc. exp. Biol., 1945, 4, 10. 


. Everett, G. M. Observations on the behavior and neurophysiology of acute 


thiamine deficient cats. Am. J. Physiol., 1944, 141, 489-448. 


. Foltz, E. E., Ivy, A. C., and Barborka, C. J. Influence of components of the vita- 


min B complex on recovery from fatigue. J. lab. clin. Med., 1942, 27, 1396-1399. 


. Guerrant, N. B., and Dutcher, A. A. The influence of exercise on the growing rat 


in presence and absence of vitamin B;. J. Nutrition, 1940, 20, 589-598. 

Himwich, H. E. The role of the vitamins in brain metabolism. Res. Publ. Ass. 
nerv. ment. Dis., 1943, 22, 33-41. 

Johnson, R. E., Darling, R. C., Forbes, W. H., Brouha, L., Egana, E., and Graybiel, 
A. Effects of a diet deficient in part of the vitamin B complex upon men doing 
manual labor. J. Nutrition, 1942, 24, 585-596. 

Karpovich, P. V., and Millman, N. Vitamin B, and endurance, New Eng. J. Med., 
1942, 226, 881-882. 

Keys, A., and Henschel, A. Vitamin supplementation of U. S. Army rations in 
relation to fatigue and the ability to do muscular work. J. Nutrition, 1942, 23, 
259-269. 

Keys, A., Henschel, A., Mickelsen, O., and Brozek, J. The performance of normal 
young men on controlled thiamine intakes. J. Nutrition, 1943, 28, 399-415. 
Keys, A., Henschel, A., Mickelsen, O., Brozek, J., and Crawford, J. H. - Physio- 
logical and biochemical functions in normal young men on a diet restricted in 

riboflavin. J. Nutrition, 1944, 27, 165-178. 

Keys, A., Henschel, A., Taylor, H. L., Mickelsen, O., and Brozek, J. Absence of 
rapid deterioration in men doing hard physical work on a restricted intake of 
vitamins of the B complex. J. Nutrition, 1944, 27, 485-496. 

Keys, A., Henschel, A., Taylor, H. L., Mickelsen, O., and Brozek, J. Experimental 
studies on man with a restricted intake of the B vitamins. Am. J. Physiol., 
1945, 144, 5-42. 

Kniazuk, M., and Molitor, H. The influence of thiamine deficiency on work per- 
formance in rats. J. Pharm. exp. Therap., 1944, 80, 362-372. 

National Research Council. Recommended dietary allowances, revised 1945. Re- 
print and circular series No. 122, 1945. 


. Prickett, C. O. The effect of a deficiency of vitamin B; upon the central and 


peripheral nervous systems of the rat. Am. J. Physiol., 1934, 107, 459-470. 

Seashore, R. H., Buxton, C. E., and McCollom, I. N. Multiple factorial analysis 
of fine motor skills. Am. J. Psychol., 1940, 53, 251-259. 

Simonson, E., Enzer, N., Baer, A., and Braun, R. Influence of vitamin B (complex) 
surplus on capacity for muscular and mental work. J. Indust. Hyg. Tozicol., 
1942, 24, 83-90. 

Stephenson, W., Penton, C., and Korenchevsky, V. Some effects of vitamin B and 
C on senile patients. British med. J., 1941, 2, 839-844. 


. Taylor, H. L., Brozek, J., Henschel, A., Mickelsen, O., and Keys, A. The effect 


of successive fasts on the ability of men to withstand fasting during hard work. 
Am. J. Physiol., 1945, 143, 148-154. 


Motor;Performance of Normal Young Men 379 


25. Williams, R. D., and Mason, H. L. Further observations on induced thiamine 
(vitamin B;) deficiency and thiamine requirements of man: Preliminary reports. 
Proc. Staff Meet. Mayo Clinic, 1941, 16, 433-438. 

26. Williams, R. D., Mason, H. L., Power, M. H., and Wilder, R. M. Induced thiamine 
(vitamin B,) deficiency in man: Relation of depletion of thiamine to development 
of biochemical defect and of polyneuropathy. Arch. int. Med., 1943, 71, 38-53. 

27. Williams, R. D., Mason, H. L., and Wilder, R. M. Minimum daily requirements 
of thiamine of man. J. Nutrition, 25, 1943, 71-97. 

28. Ziegler, W. A. Minnesota rate of manipulation test. Minneapolis, Minn.: Educ. 
Test Bureau, 1939, 9 pp. 





| 





i Standardization of a Test of Hand Strength * 


M. Bruce Fisher 
Fresno State College, California 


and 


James E. Birren 
Northwestern University 


In the course of developing a battery of sensorimotor tests for the 
assessment of the efficiency of naval personnel under various conditions 
| of stress, it was believed desirable to include the measurement of some 
function dependent on muscular strength. The extensive use of the 
hand dynamometer over the past hundred years (7, 10) suggested this 
device as the test apparatus. Furthermore, it is easily adaptable to a 
wide variety of testing circumstances, both in the field and in the labora- 
: tory. High reliability was desired since the proposed experiments in 
which the test would be used were to involve small numbers of subjects. 
: An additional desired characteristic was that the test should involve a 
. small amount of fatiguing work—at least more than usually occurs 
| when the best of two or three grips is taken as the score. Dunlap’s 
| endurance dynamometer test (2) meets this second requirement but he 
reports a low retest reliability. This finding of low reliability was con- 
firmed in a study at the Naval Medical Research Institute. Other 
exploratory work led to the selection, for further standardization, of the 
procedure to be described. 


| Procedure 


| Test Administration. Each Smedley dynamometer was modified by 
affixing, over the regular scale, one which was divided into 3-kg. units, 
numbered in order. Thirds of a scale division (1 kg.) were indicated by 
smaller divisions. Although the major purpose of this scale is to facilitate 
easy reading, its placement permits the correction of any error in the 
original scale. In four dynamometers calibrated for this study, the 





* This report was prepared when the writers were on active duty in the U. S. Naval 
Reserve at the Naval Medical Research Institute, National Naval Medical Center, 
Bethesda, Md. The opinions expressed are those of the writers and are not to be con- 
strued as reflecting the policies of the Navy Department. 


380 





Standardization of Test of Hand Strength 381 


calibration curves were approximately parallel straight lines, but the 
true zero points varied over a 3-kg. range on the original scales. 


In a test, each S adjusted the length of the movable stirrup to fit his hand, 
being warned not to make this adjustment too short. Each S was also in- 
structed to grasp the dynamometer with the fixed stirrup bearing across the 
heel (hypothenar eminence) of his hand so that all four fingers would have a 
chance to work, and so that painful pressure on the soft tissues between thumb 
and index finger would be avoided. To assist in the maintenance of consistent 
performance and facilitate reading of the scale between pulls, S always gripped 
the dynamometer with palm down and steadied it by holding the spring case 
between thumb and fingers of his other hand. This two-handed support 
usually increased the score two or three kilograms over that of a one-handed 
grip, but added to the consistency of performance. Either using talc, or 
wiping the hand on a towel between pulls, served to keep the palm dry during 
the test. 

The observer first described and demonstrated the procedure to be followed. 
S then adjusted the dynamometer to the preferred size for his hand and as- 
sumed a position of readiness to pull. The observer next said, “I will call 
numbers in order beginning at nine and going up. I will call a number every 
three seconds. Each time I call a number you must pull to the scale marker 
of that number. As soon as you have pulled up to it, release your grip, wipe 
your hand if you wish, and prepare for the next pull. Be sure you pull at 
least as high as the number I call. Squeeze promptly when I call the number. 
If you go as much as one number higher than I call, push the maximum indi- 
cator back to the one I called, before you grip again. This is to make sure you 
get up to the next number when I do call it. Keep up with the numbers as 
long as you possibly can. If you fail to reach a number, try once more when 
the next number is called. If you are still not up to the one I called, then put 
your dynamometer down to show you are through. You will do this test once 
with each hand. Use your right hand first.” 

The observer said, ‘‘Ready,’’ as he started a stop watch. At three seconds 
he said, “‘Nine’”’; at six seconds he said, ““Ten’’; etc. He continued counting 
until all the Ss who were being tested had reached their limits and then he 
recorded their scores. The procedure for four Ss on one hand, including re- 
cording, required 1% to 2 minutes. 

The same procedure was repeated immediately with the left hand after any 
necessary adjustments of the size of the grip. Each score was read to the 
nearest third of a scale division (to the nearest kilogram). An S’s score for a 
test was the average of the scores of his two hands, in kilograms. 

In testing women, whose average grip is somewhat less than that of men, 
the observer began counting, and = began pulling, at “Six.” Other details of 
procedure remained the same. 


Subjects. In the determination of test reliability, a standardization 
group of 72 unselected male naval personnel were tested twice, with 6 to 
48 hours elapsing between tests. The first test scores of 97 additional 
male naval personnel are available from data gathered at the Medical 
Field Research Laboratory, Camp Lejeune, North Carolina (5). Thirty- 
two of these Ss were tested twice. The scores of 161 Waves in the medi- 
cal department were also obtained. Data on 648 industrial workers in 
two TNT plants were supplied by Passed Assistant Sanitarian (R) 
Robert B. Malmo, USPHS, Industrial Hygiene Research Laboratory, 





EE : 


382 M. Bruce Fisher and James E. Birren 


National Institute of Health. The industrial workers were tested on the 
preferred hand only. Age statistics on the various groups of Ss are 
included in Table 1. 


Results : f 


Reliability. Reliability coefficients (r) in the standardization group 
(N = 72) are as follows: 


First Second 
Test Test 
(1) Right and left hand .79 .78 
(2) Preferred and non-preferred hand 83 85 
(3) Mean of the two hands, (2) corrected 
i for double test length 91 .92 
(4) Right hand, retest 383 
: (5) Left hand, retest 83 
} (6) Mean of two hands, retest 87 


The scores of the 32 Ss of the Camp Lejeune group on whom a second 
) test is available yield a raw retest correlation coefficient of 0.84. 
| Score Distributions. In Table 1 are distributions of first test scores 
| for the groups tested. The data for the industrial groups are for the 
preferred hand only. The comparable preferred-hand value for the 
mean of the standardization group is 57.14 kg. (¢ = 7.44); for the Camp 
| Lejeune group, 55.97 kg. (¢ = 6.81); and for the Waves, 34.86 kg. 
(¢ = 3.07). These values are significantly larger (P < .01) than those 
; of the comparable men’s and women’s industrial groups. 

Because of the differences in the age distribution of the naval and 
industrial groups (Table 1), and in view of the significant regression of 
dynamometer score on age (3, 8), a more accurate estimate of the differ- 
ence between the naval and industrial populations was desirable. Ac- 
cordingly, two new mean values for the industrial men were calculated, 

each corrected for age distribution to match one of the samples of naval 

men. To secure one of these corrected means, the average dynamometer 

scores of each five-year age range of the industrial group was weighted 
by the number of men in that age range in one of the naval samples. 
Both t-values between preferred-hand scores (calculated by comparing 
each group of naval men with the industrial group, when corrected for 
age in this fashion) showed a highly significant superiority (P < .01) of 
the naval personnel. The Waves were similarly compared with the 
industrial women, and the former were also found to be significantly 
superior (P < .01) when matched for age. 

Improvement with Practice. Three groups of Ss have been given the 
hand dynamometer test over a period of days so that improvement with 








Standardization of Test of Hand Strength 


Table 1 
Distribution of Hand Dynamometer Scores 











Naval Personnel Industrial Personnel 
(Mean of two hands) (Preferred hand score) 
Men Women Men Women 





mmanreSsBSsaana~ 


1 
5 
15 
17 
30 
16 
5 
6 
1 


169 161 552 96 
54.0 33.1 53.1 33.6 
54.0 35.0 49.3 35.0 

6.5 4.0 7.0 4.7 


Age of subjects 
X 21.8 22.7 34.4 32.4 
20.5 22.1 33.5 31.4 
e 4.6 3.3 8.7 8.4 
Range 17-46 20-36 18-68 20-57 





* These statistics were calculated from the raw scores and will not agree exactly with 
those derived from the above distributions. 


practice can be followed (Fig. 1). Data of two of these groups (curves 
A and C in Fig. 1) represent performance during a period of training prior 
to the beginning of an experiment. Data for the other group (curve B) 
are combined from an experimental and a control group, the former of 
whom underwent an experimental regime which did not produce signifi- 
cant performance differences from the latter. There is considerable 
agreement among the three curves as to slope during the first dozen 
trials in spite of differences in frequency of testing. The average im- 
provement of 40 Ss between the second and twelfth test periods is 0.31 





ho arent ne 


a ete wary 











384 M. Bruce Fisher and James E. Birren 








60r 
= . . 
WwW 
S 
Oo SF 
w” 
or 
uJ 
to 
Ss Cc 
©O sor 
= 
 § 
= 
Se 
(on) 
45 ~* i 1 i i i i 
1 5 10 15 20 25 30 35 
TEST PERIOD 


Fie. 1. Hand dynamometer practice curves. Curve A: three tests per day, 6 hours 
apart, for four days; 18 Ss. Curve B: one test per day for 45 days, except a 6-day 
interval between tests 1 and 2, and a 3-day interval between tests 2 and 3;12Ss. Curve 
C: one or two tests per day in a period of 17 days (no test on some days); 10 Ss. 


kilograms per period. This number is the slope of the line fitted to these 
eleven points by the method of least squares. 


Discussion 


The procedure for use of the hand dynamometer developed in this 
study has shown itself, from the point of view of both split-half and retest 
reliability, to be sufficiently precise for use with small groups of Ss. The 
higher reliability of this procedure as compared with others using the 
hand dynamometer is presumed to result from: (1) the use of a number of 
trials of increasing difficulty so that the S has opportunity to ‘‘warm up,” 
get used to the task, and still have several opportunities to exert himself 
to the limit; (2) the small amount of fatigue which working near the limit 
for several trials produces, and which therefore, requires the final squeeze 
to be made under a standardized stress; and (3) the nearly complete 
elimination of pain as a factor in determining the score. No special 
significance is attached to the exact time interval (3 sec.) or the increment 
(3 kg.) between squeezes in the procedure developed. 

The phenomena associated with practice on the test are those to be 
expected with repeated brief exercise of any muscle or muscle group. In 
the early part of a practice series, minor changes in the manner of pulling 





Standardization of Test of Hand Strength 385 


may be of some significance in score changes, but these are uncommon. 
The relatively large change between first and second test in one experi- 
mental group (curve B, Fig. 1) is probably to be accounted for by the 
fact that these Ss took their first test under somewhat unfavorable 
physiological circumstances but had five days of rigidly controlled living 
and physical exercise before the second. 


Applications of the Test 


The dynamometer test has ‘face validity’? but its inclusion in a 
battery of performance tests should nevertheless be justified by some 
more specific idea of what it measures. Observations made on subjects’ 
behavior in more than 2,000 tests with the hand dynamometer at the 
Naval Medical Research Institute, under many conditions, are in agree- 
ment with factorial findings (2, 4, 9) that there is more than simple 
strength in such a test and that motivational and attitudinal elements 
usually play a part in determining a score. More objective indication 
of the validity of the test as a measure of response to severe physiological 
stress is available in the data of two experiments conducted at the 
Institute, and another, using the same procedure, carried on elsewhere. 

A group of 105 men were tested on the hand dynamometer before 
engaging in a “fatigue run” lasting 18 hours. This consisted of nearly 
continuous marching, calisthenics, and active military exercises, and 
involved a minimum of hand work. About half the men were forced to 
drop out before the end of the run on account of fatigue. Thirty-two of 
these men were retested on the dynamometer one to three hours after- 
ward and before they had any sleep. 

The mean decrease in score of this group was 2.03 kilograms, with a 
standard error of 0.25 kilograms. This decrease in score is to be compared 
with a mean increase from first to second test of 0.88 kilograms (standard 
error, 0.40 kg.) in the standardization group. The difference between 
these two changes is highly significant statistically. Only 38 per cent 
of the standardization group did not show improvement between the 
first and second test, whereas 75 per cent of those who failed to finish the 
fatigue run showed no improvement. In this experiment the physiolog- 
ically determined ability to perform work was presumably adversely in- 
fluenced and motivation either was not sufficient to overcome the physi- 
ological decrement or was concomitantly decreased. 

In a second experiment, 17 Ss were in a closed chamber for two and a 
half days, during which time the carbon dioxide concentration built up 
to five per cent and the oxygen decreased from 21 to 12 per cent (1). 
During this time the Ss were tested repeatedly on a number of ‘‘mental”’ 
and relatively complex sensorimotor functions which showed only small 





386 M. Bruce Fisher and James E. Birren 





or insignificant decrements. The hand dynamometer, however, showed 
a decrease in score which was highly significant statistically. The mean 
score just before release from the stress was 1.06 o below the mean of the 
pre-experimental distribution and only one S showed an increase in score 
at that time over his pre-experimental value. In this experiment, con- 
comitant measures of cardiovascular and respiratory functions showed 
several rather severe disturbances. These dynamometer data are inter- 
preted to mean that the Ss’ “willingness to exert effort” (2, 9) was unable 
to compensate for the physiological decrement suffered. That such 
“‘willingness”’ was present was inferred from intimate observations of the 
Ss and from the fact that on other tests where motivation was important 
and strength was not, the Ss maintained or nearly maintained their 
previous performance levels. Mean hand dynamometer performance 
was still 0.56 o below the mean of the pre-experimental test four to six 
hours after release from the chamber and only 5 of the 17 Ss had exceeded 
their pre-experimental performance. A number of the respiratory and 
cardiovascular functions measured at the same time also failed to recover. 

In a third experiment (6) environmental temperature was the variable 
and was carefully controlled for a period of 44 days. There was no 
significant difference in daily tests between an experimental and a control 
group on a wide variety of sensorimotor tasks, and none but minor and 
easily reversible differences on a number of physiological measures. The 
hand dynamometer also showed no differences between the experimental 
and control groups. The curves for the two groups were essentially 
parallel throughout the experiment and are combined in Figure 1, curve 
B. The departures from a smooth curve are most simply and adequately 
explained as motivation changes; a lapse in interest in the middle of the 
experiment produced the long plateau, with revived enthusiasm being 
reflected in improved performance as the end of a long confinement 
approached. These trends were shown to be unrelated to any physiolog- 
ical changes during the same periods. 

The first two of these three experiments indicate that this test can 
measure changes in hand strength when the internal physiological en- 
vironment is modified beyond the limits for which motivation can com- 
pensate. The third experiment indicates the converse: when the physi- 
ological state is stabilized over a considerable period, motivation becomes 
paramount in determining day-to-day variations in individual and group 
scores. 


Summary 


1. A test procedure for the hand dynamometer was developed which 
is satisfactory for inclusion in a battery of performance tests. In this 





Standardization of Test of Hand Strength 387 


procedure, S squeezes the dynamometer every three seconds, starting 
with a grip of 27 kg. and increasing his grip 3 kg. with each subsequent 
attempt. The test continues until he is unable to achieve a level re- 
quired, the score being the highest level reached. 

2. The split-half reliability coefficients for this test procedure, pre- 
ferred vs. non-preferred hand, were 9.01 and 0.92 for the first and second 
tests, respectively (VN = 72). The retest reliability of the test, ad- 
ministered twice within two days, was 0.87 (N = 72). 

3. Improvement with practice on this hand dynamometer test has 
been followed in three groups for 12, 14, and 38 practice periods under 
relatively normal conditions. Mean improvement in the early periods 
of practice for 40 Ss was 0.31 kg. per period. 

4. Several groups of data involving various stress conditions indicate 
that the test has some degree of validity, in that test scores parallel the 
cardiovascular and respiratory responses and reported general fatigue of 
Ss under such conditions. 


Received September 24, 1945. 


References 


1. Consolazio, W. V., Fisher, M. B., Pace, N., Pecora, L. J., Pitts, G. C., and Behnke, 
A. R. The effects on man of prolonged exposure to increased carbon dioxide and 
decreased oxygen pressures. Naval Medical Research Institute, Research Project 
X-349, Revised Report, 1945. 

. Dunlap, J. W. Tests of the “ability to take it.” Civil Aeronautics Administra- 
tion, Division of Research, Report No. 11, Washington, D. C., 1943 (restricted). 

. Fisher, M. B., and Birren, J. E. Age and hand strength (in preparation). 

. Howell, T, H. An experimental study of persistence. J. abnorm. soc. Psychol., 
1933, 28, 14-29. 

. New, W.N., etal. Validation of physical fitness tests by evaluation of performance 
of subjects. Medical Field Research Laboratory, Camp Lejeune, N. C., Re- 
search Project X-526. 

. Pace, N., Fisher, M. B., Birren, J. E., Pitts, G. C., White, W. A., Consolazio, W. V., 
and Pecora, L. J. A comparative study of the effect on men of continuous 
versus intermittent exposure to a tropical environment. Naval Medical Re- 
search Institute, Research Project X-205, Report 2, 1945. 

- Quetelet, A. Sur homme et le développement de ses facultés. Paris: Bachelier, 
Imprimeur-Libraire, 1835, 2 vols. 

. Todd, T. W. Skeleton and locomotor system. In: E. V. Cowdry (Ed.). Prob- 
lems of ageing (2nd Ed.). Baltimore: Williams and Wilkins, 1942 (Chapter 12). 

. Wherry, R. J. Preliminary report on the construction of a test battery of per- 
sistence. Supplement I in (1) (restricted). 

. Whipple, G. M. Manual of mental and physical tests. Part I: Simpler processes 
(2nd ed.). Baltimore: Warwick and York, 1914, pp. 100-129. 








eae me 


Se er ee 





The Time Appreciation Test * 


John N. Buck 
Chief Psychologist, Lynchburg State Colony, Virginia 





As Dr. Grace H. Kent ' has stated, one of the principal purposes of 
an “emergency test” is to provide the psychiatrist with a psychometric 
tool that he can employ when no qualified psychological assistance is 
available. 

There are many situations, however, in which such tests can be of 
very real service to the psychologist himself. Wherever and whenever 
“time is of the essence,”’ so to speak, the “‘emergency test”’ has a definite 
place. In the clinic and in the mental hospital it may be used as an 
initial screening device and to help to determine what further psycho- 
metric procedure may be indicated; it may also be used at regular inter- 
vals as a follow-up check upon the patient’s progress. In personnel 
work it may be employed by the interviewer as an aid in making a rough 
appraisal of the applicant’s qualifications for a given position; also to 
check upon the veracity of the applicant’s statements concerning his 
educational attainments. The value of the emergency test to the armed 
forces has been amply attested to in the journals in the last few years. 


Standardization 


The Time Appreciation Test was first given (as a group test) to ap- 
proximately 675 white persons ranging in life age from 8 to 23 years, and 
in educational status from the third grade in Grammar School to the 
third year in college, in the city schools of Lynchburg, Va.; the Hughes 


* The author wishes to acknowledge his especial indebtedness to the following whose 
fine courtesy and wholehearted cooperation made possible the establishment of the 
norms: Mr. Omer Carmichael, Superintendent of City Schools, Lynchburg, Va.; Mr. 
E. W. Paylor, Superintendent, Hughes Memorial School, Danville, Va.; Captain Harry 
Carmine, Fork Union Military Academy, Fork Union, Va.; Dr. Oscar DeWolf Ran- 
dolph, Rector, Virginia Episcopal School, Lynchburg, Va.; Dr. William Hinton, Assistant 
Professor of Psychology, Washington and Lee University, Lexington, Va.; Mrs. Dorota 
Rymarkiewiczowa, Chief Psychologist, University of Virginia Hospital, Charlottesville, 
Va. To Mrs. D. E. Mack go deep thanks for her valuable assistance in the statistical 
treatment of the data. Space does not permit adequate expression of gratitude for the 
help afforded by many others. 

1 Kent, Grace H., Oral test for emergency use in clinics; Mental Measurement Mono- 
graphs, Baltimore: Williams & Wilkins; 1932. 


388 





The Time Appreciation Test 389 


Memorial School, Danville, Va.; Fork Union Military Academy, Fork 
Union, Va.; Virginia Episcopal School, Lynchburg, Va.; and Washington 
and Lee University, Lexington, Va. 

The tentative norms thus established on the basis of chronological 
selection were later amended in certain instances as the result of evidence 
produced by administering the test individually to some 350 white persons 
(ranging in life age from 7 to 75 years, and in educational status from ‘‘no 
schooling” to eight years of college work) whose intellectual level had 
previously been appraised by use of the Stanford-Binet, the Wechsler- 
Bellevue Scale, or some other psychometric device of comparable status. 

In passing, it should be stated that no significant sex differences were 
found. 

The user of the Time Appreciation Test must always bear in mind the 
fact that the accuracy of the norms has not been established for the 
negro or the foreign born. 


Description 


The test consists of 30 questions relating to various aspects of time: 
the first seven questions deal with immediate orientation; questions No. 
17, 18 and 19 are about holidays; questions No. 20, 21, 22, 23, 24, 25, 26, 
27, 28, 29, and 30 call for definition of certain phases of time; the remain- 
ing nine questions deal with the reduction of large time units into smaller 
units (and question No. 16 usually demands mathematical calculation of 
a more than elementary type in the individual test situation). 

Tabulation shows that the “rank’”’ order of difficulty of the questions 
for the Standardization Group was as follows (from the easiest to the 
most difficult): 1, 8, 3, 4, 6, 9, 12, 10, 13, 14, 2, 5, 18, 7, 23, 21, 15, 25, 17, 
16, 19, 26, 20, 22, 28, 11, 27, 24, 29 and 30. 


Administration and Scoring 


The Time Appreciation Test ? is intended for use with adults * and 
with children of not under ten years life age. It may be used as an indi- 
vidual test, or as a group test (with subjects who have had at least a 
fourth grade education). All other things being equal, the test is more 
useful and more accurate when given individually, for the examiner then 
has an opportunity to obtain information from the subject’s response- 
pattern (decisive or indecisive; rapid or slow; labile or calm, etc.) and 
verbal-expression (pedantic or colloquial; verbose or terse, etc.) that is 


* If the Time Appreciation Test is used as one of a battery of tests, it is suggested 
that it should not be the last test given. 
* Persons of life age sixteen years or more. 








390 John N. Buck 





denied him in group examination, and the examiner can clear up, by 
additional questioning, any ambiguity in the subject’s answers. 


JNB TIME TEST 


(Copyright, 1943, John N. Buck) 





MOLL Fa) ERT MORN ae Its | PEE ne ee eee eee 
ESL AME RE © LaF RARE hee SENET eee eae ne ae 
MING ic acheets ated e nie caeeta tian fs SIRIESEY cl IEE PR 
ev cicicseibinethemcentiatbaatiaeieiiide I iia Since eska ars cenitinthapneneieaetee coma 


1. Is it morning or afternoon now? 

2. About what time is it by the clock now? 
3. What day of the week is it? 

4. What month is it? 


. What day of the month is it? 
. What year is it? 


5 
6 
7. What season of the year is it? 
8. How many days are there in a week? * 
9. How many minutes are there in an hour? 
10. How many hours are there in a day? 
11. How many days are there in a month? 
12. How many months are there in a year? 
13. How many seasons are there in a year? 
i 14. How many seconds are there in a minute? 
; 15. How many months are there in a season? 
16. How many seconds are there in an hour? 





; 

} 17. In what month is Thanksgiving Day? On what day in that month does it always come? 

18. In what month is Christmas? On what day in that month does it always come? 
19. In what month is Hallowe’en? On what day in that month dves it always come? 


20. What is a decade? 

21. What is a century? 

22. What is a fortnight? 

23. What does anyone mean when he says: “Nine A.M.” and “Nine P.M.’’? 

. What words do those initials “A.M.” and “P.M.” stand for? 
25. What does anyone mean when he says: “The year 450 B.C.” and “The year 450 A.D.”’? 
26. What words do those initials “B.C.” and “A.D.” stand for? 
27. What is a time zone? 

i 28. Name the time zones in the United States. 

. 29 
30 


tN 
a 


. What is Greenwich mean time? 
. What does anyone mean when he says “Vernal Equinox” and “Autumnal Equinox”? 
i Score: Correct:.......... Half-correct:.......... Total Points:.........C.A..........Equiv. M.A... 


: Fic. 1, Sample of time appreciation test. 








"9 


Sf | 


The Time Appreciation Test 391 


When the test is given to but one subject at a time, the examiner re- 
tains the form sheet; reads the questions to the subject, and records the 
subject’s answers verbatim. When the test is given to a number of 
persons simultaneously, each subject has his own form sheet; reads the 
questions to himself; and writes his own answers (to the right of the 
printed questions). See Figure 1. 

No explanation or alteration of the wording of the questions (except 
as indicated later), or of their sequence is permissible. The entire thirty 
questions should be asked. 

There is no time limit; usually the test can be given and scored in ten 
minutes ‘ or less. 

Under the scoring system, as outlined in the following pages, two 
points are given for each correct answer; and in a number of instances one 
point is allowed for an answer that is partially correct. 

After he has made certain that neither a calendar, nor a watch or 
clock is available for inspection by the subject, the examiner will say: 
“Mr. X, I have some questions here that I’d like to ask you. First I am 
going to ask you a very easy one... .” 

As a rule no other introductory remarks will be needed, since it is 
assumed that rapport will have been established before the test is used. 


Directions for Scoring 


Quest. 1. Is it morning or afternoon now? Score two points for correct 
answer. No partial credit is allowed. 

Quest. 2. About what time is it by the clock now? Score two points for 
any answer that is within 30 minutes of the actual time (either way). Ex- 
ample: “1:46” at 1:25. Score one point for any answer that is within 60 
minutes (either way) of the actual time. For instance: “2:10” or “12:35” 
at 1:25. ; 

Quest. 3. What day of the week is it? Score two points for the correct 
answer. In group testing the number (as: “second,” for Monday) will suffice, 
but in the individual test situation, the day must be named correctly. Score 
one point for an answer that is not more than one day wrong. For example: 
on ; Tuesday, either ‘‘Wednesday” or ‘““Monday”’ would merit one point’s 
credit. 

Quest. 4. What month is it? Score two points for the correct answer. In 
group testing, it will suffice for the subject to give the number (as: “Seventh,” 
for July), but in individual testing the examiner will allow full credit only for 
the proper name, in response to a repetition of the original question. Score 
one point for the month just ended (on the first day of a new month) as: 
“July” on the first day of August, or for the month to come (on the last day 
of a month) as: “August” on the 31st of July. 

Quest. 5. What day of the month is it? Score two points for the correct 
answer. Score one point for an answer that is not more than three days wrong 
(either way). For example: the subject who states on the 31st of March that 


‘ Time consumption greatly in excess of this will usually be significant of a disturb- 
ance involving more than a deficiency of intellect. 





392 John N. Buck 





it is the 28th, 29th, 30th of March or the Ist, 2nd, or 3rd of April will receive a 
point. Note, though, that he will not receive credit at this point if he says it 
is the 3rd of March (not April). 

Quest. 6. What year is it? Score two points for the correct year. Score 
one point for the year just ended (on the Ist of January) or the year to come 
(on the 3lst of December). Note: If the examiner can obtain a reply to this 
question only by adding “It is 1900 and what?’’, one point is to be given if the 
subject responds to this “leading” question with the correct year. 

Quest. 7. What season of the year is it? Score two points for the correct 
season, or for either winter or spring on the 21st of March; for either spring or 
summer on the 21st of June; for either summer or fall on the 22nd of Sep- 
tember; for either fall or winter on the 21st of December. Score one point 
for the season to come in the month of the change but before the change has 
actually taken place (as: ‘Winter’ on the 5th of December) or for the season 
just ended, provided not more than three days have elapsed after the change 
(as: “Fall” on the 23rd of December). Note: If the subject states, for instance: 
“It is the hunting season,” the examiner will continue with: “Yes, I know 
that, but what season of the year is it? Is it so-and-so or so-and-so?” (for 
“so-and-so” the examiner will substitute the names of the seasons bracketing 
the actual season—in winter, to illustrate, it would be ‘Is it fall or spring?’’)— 
and if the subject gives the correct season he is to be given one point’s credit. 

Quest. 8. How many days are there in a week? Score two points for either 
“Seven” or “Six days and Sunday.” If the subject answers “Six” or “Six 
days” or “Six weekdays,” the examiner will continue with: ‘Does that include 
Sunday” and give the subject one point if he replies that it does not. If the 
subject says, ‘I don’t know,” the examiner will ask the subject to name the 
days of the week. If the subject can do so, the examiner will repeat the 
original question and if the subject says, “Seven,” the examiner will allow one 
point’s credit therefor. 

Quest. 9. How many minutes are there in an hour? Score two points for 
“Sixty.” No partial credit is allowed. 

Quest. 10. How many hours are there in a day? Score two points for 
“Twenty-four” or “Twelve hours in a day; twelve in a night’’; score one point 
for “Twelve.” Note: If the subject says, for instance, ‘There are eight hours 
in a working day” (or something similar), the examiner will continue with: 
“But how many hours are there in a full day?” and the examiner will allow 
one point for “Twenty-four.” 

eek. 11. How many days are there in a month? Score two points for 
“28, 29, 30, and 31,” or “Anywhere from 28 to 31” (or an equivalent response). 
Score one point for any answer containing a 30 and a 31, and a 28 or a 29, but 
not all four. Note: If the subject says, ‘‘Usually 30,” or “It varies,’ or “They 
have a different number,” the examiner will say ‘Explain,’ and will allow 
credit in accordance with the calibre of the reply. 

Quest. 12. How many months are there in a year? Score two points for 
“Twelve.” No partial credit is allowed. 

est. 13. How many seasons are there in a year? Score two points for 
“Four.” No partial credit is allowed. 

Quest. 14. How many seconds are there in a minute? Score two points for 
“Sixty.” No partial credit is allowed. 

Quest. 15. How many months are there in a season? Score two points for 
“Three.” No partial credit is allowed. If the subject says, “That varies 
with the part of the world you are in,” the examiner will ask, “How many 
months are there in a calendar season?’’, and he will allow two points credit 
for the answer “Three.” 

Quest. 16. How many seconds are there in an hour? Score two points for 
“Three thousand, six hundred,” or “Thirty-six hundred.” This answer must 








The Time Appreciation Test 393 


be arrived at without the aid of pencil and paper (or their equivalent) when the 
test is given individually; in group testing the subject may work out the prob- 
lem on his Form Sheet—in a surprising number of instances this computation 
is incorrectly done! No partial credit is allowed. 

Quest. 17. In what month is Thanksgiving day? On what day in that month 
does it always come? Score two points for “In November; on the last Thurs- 
day,” or “November, on the Fourth Thursday.” If the subject says ‘In 
November—on the third Thursday,” the examiner will say, ‘“Yes, President 
Roosevelt did proclaim the third Thursday as Thanksgiving, but on what day 
in the month did the old-fashioned Tha iving always come?’’ and he will 
allow full credit for “Last,” or “Fourth.” Score one point if November is 
named correctly, but the day of the month is not. No credit is to be allowed 
if the month is named incorrectly even though the day itself is given as the 
last Thursday. 

Quest. 18. In what month is Christmas? On what day in that month does 
it always come? Score two points for ‘December 25th” (or its equivalent). 
Score one point if the month is named correctly, but the day is not. No credit 
is to be given if the month is not named correctly. 

Quest. 19. In what month is Hallowe’en? On what day in that month does 
it always come? Score two points for ‘The last day of October,” ‘October 
3ist” (or their equivalent). Score one point if the month is named correctly, 
regardless of what day is given. No credit is to be granted if any month but 
October is named. 

Quest. 20. What is a decade? Score two points for “Ten years,” or ‘Ten 
days.” No partial credit is allowed. 

Quest. 21. Whatisacentury? Score two points for any answer indicating 
that a century is a period of 100 years. No partial credit is allowed. 

Quest. 22. What is a fortnight? Score two points for ‘‘Two weeks,” or 
“Fourteen days,” or “Fourteen nights.’”’ Score one point for “Fifteen days” 
or “Half a month.” 

Quest. 23. What does anyone mean when he says: ‘‘Nine A.M.” and “Nine 
P.M.”? Score two points for any answer indicating that the former precedes 
noon; the latter followsit. The answers: “Afternoon,” “Evening,” or “Night,” 
all suffice for the latter. Score one point if either ‘(Nine A.M.” or “Nine 
P.M.” is defined correctly, but not both. 

Quest. 24. What words do those initials “A.M.” and “P.M.” stand for? 
Score two points for “Ante Meridian and Post Meridian.”” Note: Misspelling 
of the words in group testing does not rob the subject of full credit if the 
examiner feels certain that the fault lies in the spelling alone. Note: If the 
subject merely repeats his “‘Morning and night’’ response to Question 23, and 
there is any reason to believe he actually knows the correct answer, the exam- 
iner should repeat Question 24 casually. Score one point for either “Ante 
Meridian” or “Post Meridian,” but not both; or for both “Before noon” and 
“After noon.” No credit is to be given for “ante morning” and ‘‘post morning.”’ 

Quest. 25. What does anyone mean when he says: “The year 450 B.C.” and 
“The year 450 A.D.”? Score two points for any answer indicating that the 
former date was 450 years before the birth of Christ, the latter 450 years after 
Christ’s birth. Note: In the individual test, if the subject states that the two 
terms mean, respectively, ‘‘Before and after Christ,’ full credit cannot be 
allowed until and unless additional questioning by ‘“‘What do you mean by 
after? After birth or after death?” elicits the fact that the subject means 
after birth. But in group testing ‘Before and after Christ” receives full credit 
since further questioning is impossible. Score one point if either term is 
properly explained, but not both. 

Quest. 26. What words do those initials “B.C.” and “A.D.” stand for? 
Score two points for “Before Christ’’ and either ‘“‘Anno Domini” or “In the 








394 John N. Buck 





See ice inline ats ein 


year of our Lord.” Note: Misspelling of the words “Anno Domini” does not 
cost the subject full credit (in group testing) if his intent is obvious. Score 
one point if either set (but not both) is defined correctly. 
ang 27. What is a time zone? Score two points for an answer stating, 
in effect, that the world is divided arbitrarily into 24 longitudinal belts of 
approximately equal size (15° each), with the time in adjacent belts one hour 
apart. Score one point for an answer that defines a time zone as an area of 
the earth’s surface in all parts of which the time is the same simultaneously. 
; Note: the examiner must be careful to appraise the calibre of the answers on 
i the —_ of the subject’s knowledge rather than on his ability to express himself 
concisely. 
{ Quest. 28. Name the time zones in the United States. Score two points for: 
“Eastern, Central, (Rocky) Mountain, and Pacific’—named in any order. 
} Score one point for three of the four zones named above (one of the three must 
’ be the zone in which the subject is at the moment); or for an answer naming 
the correct four zones and also a fifth and incorrect zone (such as Midwestern, 
or Western). The inclusion or omission of “Standard,” “Daylight Saving,”’ 
“War Time,” as elaboration (as, for example, ‘Eastern War Time, Central 
| War Time,” etc.) does not affect the scoring. 
1 Quest. 29. Whatis Greenwich mean time? Score two points for any answer 
that says, in effect, that all standard times are based on their longitudinal 





a Ce ORY 


relationship to the meridian on which is located the Greenwich Astronomical 
i Observatory in England. Score one point for “That’s the time at Greenwich, 
England” (Greenwich Village will not suffice), or “Greenwich is located at 0° 
longitude” —or something equivalent thereto. The differential credit factor 
in this question and in question 27 is the all-inclusiveness or localization of the 
| subject’s answer. 
Quest. 30. What does anyone mean when he says: ‘Vernal Equinox” and 
“Autumnal Equinox”? Score two points for any answer that accurately defines 
“vernal,” “autumnal,” and “equinox.” It is not necessary that the subject 
be specific as to dates; neither is it necessary for him to define the sun’s position 
in relation to the equator. Score one point for any answer correctly defining 
“vernal” and “autumnal,” or “equinox,” but not all three. 


As soon as the credit points have been summed up, the examiner can 
turn to the tentative norms in Table 1 and convert the point score into a 
mental age equivalent score and if the patient is 16 years of age or older, 
i to an adult intelligence quotient, and the equivalent adult intelligence 
level classification as well. 


Validity 
Tables 2 and 3 offer evidence as to the validity of the Time Apprecia- 


} tion Test. The test’s comparatively high degree of correlation with the 
Stanford-Binet seems quite satisfactory for an emergency test. 


Reliability 


It is believed that the difference between the reliability coefficients 
shown in Table 4 is due more to the difference between the types of sub- 
jects who made up the groups than to the difference between the methods 
of administration of the test. It is scarcely to be expected, however, that 
the test-retest reliability coefficient will be as high for group administra- 


Re oe 








The Time Appreciation Test 395 


Table 1 
Tentative Norms for the Time Appreciation Test 


Note: All four columns are for use with subjects of life age sixteen or above. The 
first two columns (reading from left to right) may be used with subjects of life age ten 
and over for converting point scores to mental age equivalent scores. 














Time Appreciation Adult Adult Classi- 
Test Point Score M.A 1.Q.* fication 
2 4:6 30 
3 5:0 33 
4 5:6 37 Imbecile 
5 6:0 40 
6 6:6 43 
7 7:0 47 
12 7:6 50 
16 8:0 53 
21 8:6 57 Moron 
25 9:0 60 
28 9:6 63 
31 10:0 67 
33 10:6 70 
35 11:0 73 Borderline 
37 11:6 77 
39 12:0 80 
41 12:6 83 Dull average 
42 13:0 87 
43 13:6 90 
44 14:0 93 
45 14:6 97 
46 15:0 100 Average 
47 15:6 103 
48 16:0 107 
49 16:6 110 
50 17:0 113 Above average 
51 17:6 117 
; 52 18:0 120 
53 18:6 123 Superior 
54 and up above 18:6 above 123 





* Estimated according to the method of Terman and Merrill, Measuring intelligence ; 
Boston: Houghton Mifflin Company, 1937. When comparing the Time Appreciation 
Test 1.Q. of a subject of life age 30 and above with his Wechsler-Bellevue 1.Q., it is 
suggested that the examiner in computing the Time Test I.Q. allow for the age factor 
by using the “Table of Approximate C.A. (Adult M.A.) Denominators for Binet 
Scales . . .” as described in the third edition of The measurement of adult intelligence, 
by David Wechsler (Baltimore: Williams and Wilkins, 1944). 









































































John N. Buck 


Table 2 


Pearson Coefficients of Correlation between the Time Appreciation Test and 
Three Other Intelligence Tests 


Note: Group A was composed of students (9th through 12th grades) of the Virginia 
Episcopal School of Lynchburg, Virginia. Both the Otis and the Time Appreciation 
Test were given as group tests. Groups B, C, D, E, and F were composed of in- and 
out-patients of the Lynchburg State Colony, Colony, Virginia. The patients in ques- 
tion were mentally deficient and/or epileptic, or suspected of being either or both. The 
tests in each instance were given individually. Group G was composed of mentally 
deficient in- and out-patients of the Lynchburg State Colony, Colony, Virginia (there 
were no epileptics in this group). In each instance the tests were given individually. 














No. 

Group Chron. Age Test Cases r P.E., 
A 12t019 Otis, Gamma, Form A, I.Q. 8 Ot: . & Oe 
B 16 and over Wechsler-Bellevue Total I.Q. 59 74 +.041 
Cc 16 and over Wechsler-Bellevue Verbal 1.Q. 22 .67 +.080 
D 16 and over Wechsler-Bellevue Performance I.Q. 24 77 +.056 
E 16 and over Stanford-Binet (Form M) I1.Q. 32 88 +.027 
F Below 16 Stanford-Binet (Form M) IL.Q. 36 .78 +.044 
G Below 16 Stanford-Binet (Form M) I.Q. 31 84 +.035 





tion as for individual administration unless, somehow, the subjects to 
whom the test is given as a group can be prevented from “comparing 
notes” afterwards. 


Table 3 
Comparison of Measures of Central Tendency and Variability 











Chron. No. 
Group Age Tests Cases Mean Range o 
A 12to19 — Otis, Gamma, Form A, L.Q. 103 103 82-124 12.90 
Time Appreciation Test I.Q. 98 75-123+ 9.00 
B  16andover Wechsler-Bellevue Total I.Q. 59 61 36-90 12.37 
Time Appreciation Test I.Q. 61 37-83 9.15 
C 16andover Wechsler-Bellevue Verbal [.Q. 22 67 52-79 8.05 
Time Appreciation Test I.Q. 61 . 40-80 9.34 
D 16andover Wechsler-Bellevue Performancel.Q. 24 59 34-04 14.00 
Time Appreciation Test I.Q. 59 47-80 8.00 
E i16andover Stanford-Binet (Form M) I.Q. 32 48 22-73 13.98 
Time Appreciation Test I.Q. 53 26-80 13.60 
F Below 16 Stanford-Binet (Form M) I.Q. 36 648 ~—s «26-68 11.80 
Time Appreciation Test I.Q. 51 27-76 12.38 
G_ Below 16 Stanford Binet (Form M) 1.Q. 31 «647 )=—- 26-68 ~—S—si11.70 


Time Appreciation Test I.Q. 50 27-76 13.40 











The Time Appreciation Test 


Table 4 


Test-Retest Reliability for the Time Appreciation Test 
I. Group Administration of Test 
Note: The subjects in this group were 48 student nurses at the University of Virginia 
Hospital. Seventeen days intervened between test and retest. 





Mean Range o P.E., 


First Test 98 77-120 10.68 ; +.037 
Second Test 104 80-123 11.08 








Il. Individual Administration of Test 
Note: The subjects in this group were 20 patients at the University of Virginia 
Hospital or the Lynchburg State Colony (1 psychoneurotic; 1 paranoid condition; 
1 hypoparathyroidism; 4 epileptic; 13 mentally deficient). The test-retest time interval 
averaged 33 days; ranged from 4 days to 16 weeks. 





Mean Range o P.E., 


First Test 61 30-100 16.09 ‘ +.012 
Second Test 64 33-107 18.27 








Interpretation 


It is assumed that in the great majority of instances the subject’s 


point score will represent, albeit crudely, the subject’s effective level of 
intelligence at the moment that he takes the test. This level, of course, 
will not always be his “‘normal” effective level. 

It appears that one may assume that answers that fall into one or 
more of the following categories are indicative of potential ability: (a) 
answers whose ineorrectness appears to be due to lack of attention rather 
than to any real lack of knowledge (to illustrate: the answer, ‘‘365” to 
question No. 12); (b) mistakes in answer to one or more of the first seven 
questions where the score on the remaining twenty-three questions is of 
good calibre; (c) answers to questions No. 20, 21, 22, 27, 28, 29 and 30 
that are not of high enough calibre to merit either one or two points but 
that show some familiarity with the material involved. 

The examiner will find it worthwhile to total point scores for the first 
ten, the second ten, and the third ten questions separately: the point 
score of the average subject is highest for the first ten, progressively lower 
through the next two tens. Any change in this order suggests the pres- 
ence of some disintegrating factor and the need for more careful examina- 
tion of the subject. 

Such frankly bizarre responses as: “Some kind of weeks are good,” 
in reply to question No. 3, or “Same place I live in,” for question No. 18, 
strongly suggest the presence of a psychosis. 































John N. Buck 





398 


Answers that are accompanied by frequent, “. . . , isn’t it?”; “. . . 
could be . . .”; “I think there are . . .”; etc., are usually, as they appear 
to be, indicative of anxiety. 

Experience so far has indicated that the average subject’s point score 
tends to hold up well with advancing life age (though not so well as his 
vocabulary score, for example). The point score appears to be rather 
sensitive to psychic disturbances resulting in aprosexia. 

It must always be borne in mind that this is an ‘‘emergency test’’— 
no more. A competent psychologist should not need to be told that no 
positive diagnosis may ever be made solely on the basis of the subject’s 
score on this or any other emergency test. 


Summary 


The Time Appreciation Test is composed of thirty questions all of 
which touch in some way upon some phase of time. 

In its favor appear to be the following points: 1. Ease of administra- 
tion; 2. Simplicity of scoring; 3. Economy of time; 4. The fact that no 
special testing equipment is required; 5. Its relative freedom from cultural 
artifacts (much of the information sought is of a type that seems to be 
acquired by the average individual in the normal course of growing up) 5; 
6. The questions are apparently inoffensive to the average subject. 

Opposed to this are the following: 1. A probable sampling inadequacy 
in the groups on which the norms were established; 2. Overweighting of 
several of the last ten questions with some “special knowledge” factors; 
3. The fact that several of the questions are “‘seasonal” in type (that is, 
their relative difficulty is not uniform throughout the calendar year). 

Ultimately it may prove possible to correct certain of these defects. 

It is hoped that with all its faults the Time Appreciation Test will 
prove itself a useful addition to the emergency test group. 


Received August 19, 1945. 


5 Tlliterates seem to be penalized somewhat less severely on this than on most verbal 
tests. 








Relation of Iowa Silent Reading Test Scores to Measures of 
Scholastic Aptitude and Achievement * 


Richard W. Kilby 
Woman's College of The University of North Carolina 


As a further step toward understanding what the Iowa Silent Reading 
Test measures and its relation to college grades a random sample of 100 
Yale freshmen was studied.' Correlations were run between the I.8.R. 
Test (total and subtest scores) and final grades and various aptitude 
measures. The randomizing procedure consisted of listing the class 
alphabetically and using every fourth case for the sample. 

The correlations between the I.8.R. Test scores and final grades are 
presented in Table 1, and the correlations between the I.8.R. Test scores 
and various measures of aptitude and ability are given in Table 4. 

Correlations between the 1.8.R. Test scores and final grades will be 
considered first. All subtests except Rate and Paragraph Comprehen- 
sion are significantly correlated with average final grade. Rate bears a 
very low and insignificant relation to grade standing; Paragraph Com- 
prehension falls a little short of significance. Imus, Rothney, and Bear 
(1, pp. 67, 122) found a similar absence of relation between grades and 
rate of reading as measured by the I.8.R. Test, and concluded that 
academic performance is seldom improved by increasing the rate of 
reading. Such a conclusion is not warranted, without qualification, by 
the low correlation found; before that conclusion should be drawn it 
would have to be demonstrated that an increase in rate of reading accord- 
ing to several different types of rate measure is not correlated with im- 
provement in grades. 

In view of the low correlation of rate of reading with final grades an 
investigation of the validity of the I.8.R. Rate subtest seemed advisable. 
This. was made possible by studying a different group of students who 
had participated in a remedial reading program and had used homogene- 
ous practice materials (Equated Transfer Selections, distributed by the 
Harvard Film Service, Biological Laboratories, Harvard University) 


* This paper is part of a dissertation which was presented for the degree of Doctor 
of Philosophy in Yale University. I am indebted to J. Richard Wittenborn, Clinical 
Psychologist, for permission to use the research facilities of the Division of Student 
Mental Hygiene. 

1 Research upon the Iowa Silent Reading Test has recently been reviewed by Triggs (3). 

399 











400 Richard W. Kilby 


for speed reading during practice sessions and had been given a final 
1.8.R. Test to measure improvement. Since the students had recorded 
on progress sheets their reading rates for the practice material, it was 
possible to determine the reliability of the practice materials and use 
them for validating the I.8.R. Rate subtest. Fifty-nine cases were 
involved. The average of the last six practice selections read before 
taking the test was used in the correlation. The reliability coefficient of 


Table 1 
Correlations between Iowa Silent Reading Test Scores and Final Grades** 














LS.R. Subtests 
1 Median 
g Standard 

Grade N ac bd 2 3 4 5 6 7 Score 
Ave. of finals 10 Boe er we we DB si 40* 
English 7 2 Sb tse erm Fe 2 .31* 
Social sciences 26 17 3 43° 22 56. 57°. 28°. 3B .57* 
Sciences SS © 6 BA ae Oe .26 
Languages 4 08 0 2 ws 30° 35° 13 2 25 
Mathematics 8 06 19 383° 14 2 @3 13 .26 





* Asterisks indicate those coefficients that are at least four times their probable errors. 
** The probable error of any of the correlations in the above table may be readily 
determined from the following table: 


Range of r’s having the respective P.E.,’s: 











N 09 08 07 06 05 04 
48 00-.35 36-47 48-.57 -58-.65 

64 .00-.32 33-47 48-.58 

83 .00-.34 .35-.50 -51-.62 

85 .00-.32 .33-.49 .50-.61 

86 .00-.32 .33-.49 -50-.61 

97 .00-.22 23-44 45-58 .59-.69 
99 .00-.20 -21-.43 44-57 -58-.69 
100 .00-.18 .19-.42 43-57 -58-.69 





the practice material (the last three odd selections against the last three 
even selections) was .90. The coefficient of correlation between rate on 
the practice material and rate on the 1.8.R. Rate subtest was .69 (P.E. 
—.05), and when corrected for attenuation becr ne .75. This correlation 
indicates a reasonably high validity for the I.s.R. Rate subtest, but it 
should be pointed out that the validating selections were of approximately 
the same level of difficulty as the I.8S.R. Rate subtest, and rate of reading 
was measured in the same way, i.e. number of words read each minute. 








Iowa Silent Reading Test Scores 401 


It would be safest to interpret the correlation as indicating that the rate 
subtest has reasonably high validity for measuring that type of rate of 
reading. 

The correlation of grades in various types of courses with Rate sub- 
test are uniformly low, indicating that a student’s rate of reading as 
measured by this test bears no relation to his success in any type of sub- 
ject. On the other hand, subtests 2 (Directed Reading), 4 (Word Mean- 
ing), and 5 (Sentence Meaning) tend to be uniformly related to grades in 
all courses. Various other subtests are significantly related to grades in 
one type of course but not another. 

Some light is cast on the validity of the I.8.R. Test by the correlations 
of the various grades with the total reading score; the degrees of relation- 
ship shown are as would be expected for a test whose purpose is to meas- 


Table 2 


Zero Order and Partial Correlations between the Iowa Silent Reading Test, 
Scholastic Aptitude Test, and Average Final Grade 





1. Iowa Silent Reading Test 





- 3. 
N = 100 a-c 3 6 7 Md. 8.A.T. 





2. Ave. final grade O83 . d 20 . P 22 31 4 44 
3. 8.A.T. ear P 33. F 24 12 43 
Tin oO . 19 144 20. .27 





ure general reading ability. At the same time, the. correlations show 
that this test measures in large part those reading abilities needed for 
courses in English and the social sciences rather than in the physical 
sciences, mathematics, or languages. 

The possibility that all subtests other than the Rate subtest were 
measuring a verbal factor rather than reading ability and that this factor 
accounted for the relation to final grades made it advisable to find the 
partial correlations between the I.8.R. Test and grades when scholastic 
aptitude, as measured by the Scholastic Aptitude Test, is held constant. 
The partial correlations of the I1.8.R. subtests with grades when General 
Prediction score (predicted grade) is held constant were also determined, 
since General Prediction score would tend to account for most of the 
factors related to grades except reading ability. The sets of partial 
correlations, presented in Tables 2 and 3, indicate that some of the 
abilities measured by the I.8.R. Test continue to be related to final 
grades when other factors are partialled out. Regarding first the inter- 








repairer inn ee eras 
a 
pa 















eapieetioe ys) atin ome 
ee tas 








402 Richard W. Kilby 


Table 3 


Zero Order and Partial Correlations between the Iowa Silent Reading Test, General 
Prediction Score, and Average Final Grade (N = 100) 

















Iowa Silent Reading Test 
1 
ac bd 2 3 q 5 6 7 Md. G.P. 
2. Av. final grade 03 2 45 30 42 40 22 31 40 .69 
3. Gen. prediction Ma 18 2B AOR Kh Be 
Tiss Eh MW B48 £2 Be Se Oe 





relations between 8.A.T., final grade average, and I.S.R. subtests, certain 
subtests have a higher partial correlation with grades than do others; the 
rate section of subtest 1 has no relation, while subtests 2 (Directed Read- 
ing, 4 (Word Meaning), and 7 (Location of Information) have the highest 
relation to grades. 

Regarding the interrelations between the second group of variables— 
General Prediction score, final grade average, and I.8.R. subtests— 
again subtests 2, 4, and 7 have the highest correlation with grades when 
predicted grade is partialled out, and Rate shows the same absence of 
relation. 

Further knowledge as to what the I.8.R. Test measures may be gained 
from comparing it with various measures of aptitude or ability. This 














Table 4 
Correlations between Iowa Silent Reading Test Scores and Measures of 
Aptitude and Ability** 
LS.R. Subtests 
1 Median 
Standard 
Measure N awe bd 2 3 4 5 6 7 Score 
Gen. prediction 100 .04 .18 42° 32° 42° 50° .14 17 33* 
8.A.T. 100 07 .14 + .87* .33* .42* .60* .24* .12 .43* 
M.A.T. 100 .O .18 .18 14 .24* 22 02 11 .08 
Yale Spatial- 
Mech. Apt. 0 Fw ae we. ae Mm. Um CO .82* 
C.E.E.B. Eng- 
lish Essay 8 .10 08 21 .16 11 2 4 8 .21 





* Asterisks indicate those coefficients that are at least four times their probable 
errors. 

** The probable error of any of the correlations in the above table may be readily 
determined from the table of probable errors in the footnote of Table 1. 









Iowa Silent Reading Test Scores 403 


was done by correlating it with the College Entrance Examination Board 
English Essay Examination, the Scholastic Aptitude Test, the Mathe- 
matical Aptitude Test, the combined Spatial and Mechanical Aptitude 
Tests of the Yale Battery, and General Prediction score. The resulting 
correlations are presented in Table 4. It is evident from these data that 
this reading test does measure in varying degrees something other than 
the things which these other tests purport to measure. The correlation 
of subtest 5 (Sentence Meaning) with 8.A.T. is significantly higher than 
are those of subtests 1 (Rate and Comprehension), 6 (Paragraph Com- 
prehension), and 7 (Location of Information). 

While reading tests have been used mainly in entrance examination 
batteries to locate poor readers, it would be well to know whether or not 
the same reading test score might also be of value in improving the pre- 
diction of college success. At present the average multiple correlation 


Table 5 


Intercorrelations, Partials, and Multiple R between the Iowa Silent Reading Test, 
Average Final Grade, and General Prediction Score* 








2. 3. 
N = 100 LS.R. Median Ave. Final Grade 











1, General prediction .33 .69 
2. L.8.R. median A0 
T2.4 .26 
3.2 .66 
T3(12) 80 





* See footnote, Table 1, for the probable error of any of the above zero order corre- 
lations. 


coefficient of combinations of tests for predicting college scholarship is 
about .65, and the range of the middle 50 per cent of coefficients that have 
been reported is from .60 to .70 (2, 4). Predictive coefficients need to be 
much higher, but investigators feel that until grades are made more 
reliable, predictive efficiency cannot increase much if at all. However, if 
it can be shown that reading ability is correlated with grades when other 
predictive factors have been partialled out, then it may be said to have 
value as a predictive addition. 

It is unlikely that most reading tests would make a predictive addi- 
tion, because they duplicate too closely what is already being measured 
by a scholastic aptitude test, as is indicated by the generally high cor- 
relations between reading and aptitude tests. The LS.R. Test, however, 
is not so highly correlated with aptitude tests as are other reading tests; 
for the sample studied here a correlation of only .43 with the Scholastic 





404 Richard W. Kilby 


Aptitude Test was found (Table 4). This correlation indicates that the 
L.S.R. Test is measuring something in addition to what it measures in 
common with the 8.A.T. and suggests that the I.8.R. Test may be 
independently related to grades. To investigate this possibility the 
intercorrelations, partials, and multiple correlation for the three vari- 
ables—I.S8.R. Test median, General Predicition score and average final 
grade—were computed. These data are presented in Table 5. The 
correlation of 1.8.R. Test median with final grade average is .40 and the 
independent relation—the partial correlation—with average final grade 
when the duplication with General Prediction score has been eliminated 
is .26. The multiple R involving the combination of General Prediction 
score and I.8.R. Test median is .80, which is high for this type of relation- 
ship. These results suggest that the addition of the I.8.R. Test median 
would be a valuable contribution to a predictive battery since it is in- 
dependently related to grades. However, these findings must be veri- 
fied on a much larger population before they may be accepted. 

If the I.8.R. Test is found to be of predictive value it is likely that 
certain subtests, because of higher independent correlations with grades, 
will be of greater value than others. The data already presented in 
Tables 2 and 3 suggest that subtests 2, 4, and 7 may be of value because 
they have the highest partial correlations with grades, while others of the 
subtests, because of their low partial correlations, are of no predictive 
value and might well be omitted. 


Summary 


1. All L.S.R. subtests except the Rate subtest are related to final 
grades. 

2. The Rate subtest possesses reasonably high validity when vali- 
dated upon material of about the same level of difficulty using the same 
method of measurement. 

3. The correlation of grades in all types of courses with rate of reading 
as measured by the Rate subtest was uniformly low. 

4. The I.S.R. Test was found to have a higher correlation with grades 
in English and the social sciences than with grades in the physical sci- 
ences, mathematics, and languages. The former correlations were not 
significantly higher than the latter. 

5. The I.8.R. Test possesses an independent relation to final grades 
when other variables are partialled out. 

6. The I.8.R. Test measures something other than is measured by 
various aptitude tests. 

7. Use of the I.8.R. Test median in a battery for predicting scholastic 
success may increase the accuracy of prediction. 





Iowa Silent Reading Test Scores 405 


8. Certain I.8.R. subtests (2, 4, and 7) are more highly correlated with 
grades than are others, when related variables are partialled out. 


Received September 14, 1946. 


References 


1. Imus, H. A., J. W. M. Rothney, and R. M. Bear. An evaluation of visual factors in 
reading. Hanover, New Hampshire: Dartmouth College Publications, 1938, 
pp. 144. 

2. Segal, David. Prediction of success in college. G. P.O. Bull., 1934, No. 15, Office of 
Education. 

3. Triggs, FrancesO. Remedial reading. Minneapolis: University of Minnesota Press, 
1943, pp. 219. 

4. Wagner, M. E. Prediction of college performance. University of Buffalo Studies, 
1934, 9, 125-144. 








— econ Ok to ee analiieaenminenatin a 
2%. ~~ 


ernest ins oon tale ea ee 
- — - > 


=o es 


See 


iSeries 


+ 
: 
aH 
¢ 
t 

















oF eto 
te cana 
eee a ne eee em ra 


mp Res NT “8 


The Relationship of College Board Examination Scores and 
Reading Scores for College Freshmen 


Helen E. Peixotto 
Wheaton College, Norton, Massachusetts 


This study is an attempt to find a method of preliminary screening of 
poor readers by means of the College Board Examinations for freshmen 
entering a liberal arts college. There are various screening tests for 
reading, but since these tests are usually given by the colleges, the results 
are not available at the beginning of the first semester. Therefore it 
would be beneficial to a college remedial reading program to be able to 
make a preliminary survey of reading ability among the freshmen at the 
opening of the first semester. 


Definition of Terms 


The tests used in this study are the College Board Examinations and 
the Cooperative English Test C2: Reading Comprehension. The mean- 
ing of the various scores derived from these tests is important for any 
interpretation of the data. This is particularly true of the reading scores 
where the technique of administration influences the meaning of the 
score, and although many investigators in making tests have used similar 
terminology the technique of obtaining the scores has varied widely. 
Discussion of this problem of terminology with reference to speed of 
reading has recently been reported (1, 3, 4). Therefore the meanings of 
the scores from the tests used in this investigation are given, and where- 
ever possible, as direct quotations by the authors of the tests. 


College Board Examinations: 


“The scores on both the achievement tests and the Scholastic Aptitude Test 
give the schools information valuable to diagnostic and guidance purposes” (5). 


In other words the Scholastic Aptitude Test score presumably meas- 
ures aptitude to do college work. A definite statment regarding this 
test is difficult to find, but from the reports its validity is based on success 
of students in college work as reported by various colleges. The defini- 
tion of its function is further complicated by the fact that the Scholastic 
Aptitude Test is divided into verbal and mathematical sections, each 
yielding separate scores, but all studies and reports refer to a Scholastic 
Aptitude Test score without indicating which Scholastic Aptitude Test 
406 


College Board Examination Scores and Reading Scores 407 


score has been used. The following quotations may give some idea of 
the ability tapped by the test by means of comparisons— 


“The Social Studies Test samples all that a student has learned throughout 
his schooling and in his outside reading in the social studies. It is a measure 
of cumulative achievement and growth. It is definitely not a measure of 
accomplishment i in one particular course. 

. the Social Studies Test is suitable for students with different amounts 
of training so long as the results are interpreted in the light of the amount of 
preparation. Even without such maerpeentien. Social Studies Test scores, as 
we have seen, correlate as highly with freshmen grades as the Scholastic Apti- 
tude Test. If the amount o study is taken into account the test becomes a 
still better predictor of success in college 

All in all, the evidence from this ~~ a points to the fact that the Social 
Studies Test is a measure of ability and past achievement in the field of social 
studies, and that it is also a good index of future achievement in College” (7). 


For the English Essay Test, too, it is difficult to find any exact 
statement of just what is being measured. The best description seems 
to be contained in the following phrase, “accuracy and clarity in writing” 
(8). 

The Cooperative English Test C2: Reading Comprehension is best 
understood by the following quotations: 


F hang a Score indicates the extensiveness of the individual’s word 
nowledge.” 

“Speed of Comprehension Score represents the product of the rate at which 
the individual has attempted to comprehend the test material and his success 
in comprehending it. It isnot .. . merely a measure of the number of words 
read without regard to the thought content.” 

“Level of Com rehension Score provides a measure of the ability of the 
student to comprehend materials of increasing difficulty at the rate at which 
he chooses to work. It is a measure of ‘power’ or ‘depth’ of comprehension, 
we the extent to which a pupil is able to grasp the full import of what 

e reads.” 

“Total Reading Score is a it cm score in which each of the other three 
scores has equal weight. It ma arded as a measure of linguistic ability 
and should prove to be an exce fog index of scholastic aptitude” (2). 


Procedure 


Each freshman in this investigation took College Board Examinations 
in one of the several centers throughout the country. The College Board 
Examinations, as stated above, consist of an aptitude examination from 
which there are two scores, verbal and mathematical. The verbal score 
is the only one reported in this study. There are also certain achieve- 
ment tests, but all the students do not take all of these. All the students 
in this group took the English Essay Test, and their scores are, therefore, 
utilized in this study. 2 

Some students take these examinations twice, at the end of the junior 
year in high school and again at the end of the senior year. Others take 





| 
Li 
ty 














408 Helen E. Peizxotto 


them only once, either at the end of the junior or senior year in high 
school. The reports of the College Board are indefinite as to the relative 
effects of growth and practice on retest scores, but suggest that perhaps 
15 to 20 points should be deducted for practice effects when the tests are 
taken one year apart (6). Therefore the first score achieved by each 
student on these tests is used. 

During the first week of college each freshman was given the “‘Cooper- 
ative English Test C2: Reading Comprehension.” One group of students 
took Form Q of this test, the other Form R. The two forms are reported 
as comparable by the Cooperative Test Service. Therefore the results 
of the two groups are pooled in this paper as if all the students had taken 
the same Form. Four scores, as described above, are obtained from this 
test: Vocabulary, Speed of Comprehension, Level of Comprehension and 
Total Score. 

Intercorrelations of the various scores, i.e., verbal Scholastic Aptitude 
Test, English Essay Test and the reading test, have been computed from 
the scores of 263 students, members of two classes of freshmen. 


Results 


The results of this study are presented in terms of intercorrelations 
for 263 girls in a liberal arts college. This group represents the combined 
scores of the freshman class for the year 1943 and the year 1944. Since 
the mean scores and standard deviations for the two groups are closely 
similar only the composite table is reported here. These intercorrelations 
are given in Table 1. 

Some of the relationships shown in this table are somewhat different 


Table 1 


Intercorrelations, Means and Standard Deviations of the Six Variables for the 
Composite Group (N = 263) 











A B Cc D E F 

A 51 49 77 75 19 

B 74 86 60 16 

Cc 89 59 14 

D -76 18 

E 21 

Mean 67.3 57.8 61.9 63.2 496.0 514.5 
Standard 

Deviation 21.5 25.6 23.8 22.0 74.9 92.7 





Legend: A = Vocabulary; B = Speed of comprehension; C = Level of comprehen- 
sion; D = Total score; E = Verbal Scholastic Aptitude Test; and F = English Essay 
Test. 


College Board Examination Scores and Reading Scores 409 


from what one might expect. For instance, vocabulary is at least as 
closely related to speed of comprehension in reading as it is to level of 
comprehension. It would also appear that vocabulary is an important 
factor in the Scholastic Aptitude Test since the correlation is so high— 
as a matter of fact the verbal Scholastic Aptitude Test (the test with 
which we have chosen to work) is made up of antonyms, analogies and 
paragraphs (5). It seems apparent, then, that vocabulary is an import- 
ant factor in aptitude for college work as determined by the Scholastic 
Aptitude Test. 

Level of comprehension and speed of comprehension are closely re- 
lated, but this is partially due to a spurious factor in the method of ob- 
taining these scores (3). Neither speed of comprehension nor level of 
comprehension appears to be as important a factor in Scholastic Aptitude 
Test as is vocabulary. 

Since the Total Reading score is a composite of the other three reading 
scores the correlations between this score and the other three are highly 
spurious. However level of comprehension seems to be the most import- 
ant of the three in determining the Total Score. It is evident that Total 
Scores on the reading test correlate highest with Scholastic Aptitude 
Test scores, thus substantiating the supposition of the authors of this 
test, that it “. . . should prove to be an excellent index of scholastic 
aptitude” (2). It will be noted, however, that this correlation is very 
little higher than that between vocabulary and the Scholastic Aptitude 
Test. This raises the question whether the verbal section of the Scho- 
lastic Aptitude Test tells us much more than vocabulary would in regard 
to aptitude for college work. 

The English Essay Test correlates with the other tests to a low degree, 
but all the correlations are significant at the 1% level. Noyes (8) states 
that he would not expect a high correlation between English Essay Test 
scores and vocabulary, or Scholastic Aptitude Test scores. His whole 
discussion seems to be in a hypothetical vein, but he feels that ability in 
English can be determined with precision when the two scores, English 
Essay and Scholastic Aptitude, are combined. He suggests no method 
of weighting, interpretation or procedure in this proposed combination. 
Apparently one can presume that this test measures a function largely 
independent of those measured by the other tests. 

When the scores of individual students in this study are considered, 
65 were in the lowest quartile according to Scholastic Aptitude Test 
scores, and of these 42 obtained scores below the 35th percentile in one or 
more of the reading scores. Thus 65% of those in the lowest quartile on 
the Scholastic Aptitude Test also obtained low percentile ranking on the 
reading test. 






































Pk SAE, erm - ont ae] *y . 





Helen E. Peizxotto 
Discussion 
There have been other studies which have correlated scores from 

various tests, but the procedure and purpose of these studies have differed 

from the present one, although in general the results have been in accord 
with those found here. 

Humber (9) found the relationship between “Honor Point Ratio,” 
general aptitude as measured by the American Council Examination and 
various reading tests, including the one used in this study, for seniors in 
various curricula fields. His results show that, with the exception of 
dietetics, reading scores have greater predictive value for success in 
college than does general aptitude; reading scores are related to the 
humanities but infrequently to the sciences. Thus seniors are compara- 
tively homogeneous in aptitude, so that high achievement depends more 
on reading efficiency than on aptitude as measured by the American 
Council Examination. Of the correlations with achievement and scores 
from the Cooperative Reading Test the significant correlations, at the 
1% or 5% level, are with Speed of Comprehension, Level of Comprehen- 
sion and Total Score. There is no significant correlation with Vocabulary 
which, it will be recalled, was found to correlate higher with verbal 
Scholastic Aptitude Test scores than did Speed of Comprehension or 
Level of Comprehension. These findings of Humber seem to corroborate 
those reported above. 

An investigation by Williamson (10) finds a low predictive value for 
College Aptitude Test in relation to freshman grades and high school 
scholarship. He suggests possible causes or error which may vitiate the 
results and offers as a solution increased personnel work to eliminate 
personal factors among students. One might presume, in view of 
Humber’s study (9) and the results presented here, that reading efficiency 
might be a significant variable in the relationship studied. 

The tests used in this study have been in whole or in part different 
from those in the two investigations quoted above. Those here studied 
are tests in national use, which should add to the applicability of the 
results and, apart from the specific need which this investigation was 
designed to meet, aid in the interpretation and application of scores de- 
rived from them. The results of Williamson (10) and Humber (9) sub- 
stantiate, though indirectly, the results found here. 


Conclusion 


From the results of this study its seems evident that reading efficiency 
is an important factor in scores.on the verbal Scholastic Aptitude Test. 
Therefore, it is possible to use the verbal Scholastic Aptitude Test scores 





College Board Examination Scores and Reading Scores 411 


as a preliminary screening device for students who need remedial reading 
in college. It also appears that if the Scholastic Aptitude Test score is 
included in the final screening, not only would this procedure be justified 
but the selection of students would be made with greater reliability. 

From the results shown above a remedial reading program will ap- 
parently have little effect upon courses in English Composition; but in 
view of the findings of others (7, 9) the results of such a program should 
be most evident in those subjects usually grouped under the headings of 
“Social Studies” and “Humanities.” 


Received July 30, 1945. 


. References 


. Blommers, P., and Lindquist, E. F. Rate of comprehension of reading: Its meas- 
urement and its relation to comprehension. J. educ. Psychol., 1944, 35, 449-473. 

. The Cooperative Reading Comprehension Tests. Cooperative Test Service, 1940. 

. Flanagan, J.C. A new type of reading test for secondary school and college stu- 
dents which provides separate scores for speed of comprehension and level of 
comprehension. Official Report of the Amer. Educ. Res. Ass., 1938. 

. Flanagan, J.C. A study of the effect on comprehension of varying speeds of read- 
ing. Official Report of the Amer. educ. Res. Ass. 

. College Entrance Examination Board, 44th Annual Report of the Executive Secre- 
tary, 1944. 

. Supplementary information concerning the rating scale. 

. Chauncey, H. The Social Studies Test of the College Entrance Examination Board, 
1943. 

. Noyes, E.8. Report on the English Essay Test of the College Entrance Examina- 
tion Board, 1943. 

. Humber, W.J, The relationship between reading efficiency and academic success 
in selected university curricula. J. educ. Psychol., 1944, 35, 17-26. 

. Williamson, E. G. The decreasing accuracy of scholastic predictions. J. educ. 
Psychol., 1937, 28, 1-16. 














Book Reviews 


Steiner, Lee R. Where do people take their troubles? Boston: Houghton 
Mifflin Company, 1945. Pp. xiii + 265. $3.00. 


This is a needed book. Writing in the vein of a reporter, Mrs. 
Steiner has done ‘‘a study of the ways of men and women in trouble.” 
Her subjects are those persons who, whether forlorn, gullible, neurotic, . 
undereducated, unloved, frustrated, or merely idle and bored, are moved 
to turn for help—or for something—to ‘‘the most common public opiates, 
all of them operating within existing law.” The details of the exploita- 
tion of these persons and of outright humbuggery which we can infer 
from the factual records of this book comprise the meat of the book. 

Here, preying upon persons in trouble, are a grotesque and sewery 
array of astrologers, numerologists, “‘voices,” radio counselors, Cosmic 
guides, Personal columnists, hypnologists, Psycho-Powerhouses, Success 
specialists, doctors of divine metaphysics—in short the pseudists of every 
stripe. They ply their lucrative trades without benefit of public license. 
The author knows them all at first hand and she makes no bones about 
describing their antics, their audacity, and their piracy. She always 
writes in good humor, yet with a serious purpose. 

Mrs. Steiner was professionally trained as a medical social worker and 
a psychiatric social worker. Her first move in the study of the psychologi- 
cal underworld was sanctioned by her professional colleagues and by the 
Illinois Society of Mental Hygiene. In order to secure cases, she listed 
herself in the “‘psychology column’”’ of the Chicago telephone directory, 
classified section, as ‘The Advisory Service, for professional consultation 
in the personal, emotional and educational problems of normal people.’’ 
In response “everyone and anyone came.”’ Next she set up a long-dist- 
ance mail-order business. Then, after a move to New York, she reversed 
her strategy. Posing as a prospective client and equipped with various 
imaginary problems, she visited the whole gamut of New York’s super- 
pseudists. Finally her net to snare people in trouble was put on the air 
with a weekly radio program. 

The result is a clever, well-documented exposé, with human interest 
sidelights. It is no small accomplishment to describe these professional 
quacks by name and do it in a way to avoid libel suits. The reader who 
is temperamentally an optimist may hope that the book will arouse some 
potential victims of psychological racketeering to realize how foolish and 
dangerous it is to look for happiness and well-being in such quarters. 

412 





Book Reviews 413 


Early in her investigation, we infer, Mrs. Steiner got the itch to do 
“real” counseling herself, and apparently she has kept at it. The book is 
less explicit on this subject than it ought to be. Despite her crackdown 
on other “counselors,” little is said of her own methods and she does not 
appear in the slightest degree on the defensive about them. She worked 
under Alfred Adler and she speaks of her “patients” (p. 135). On p. 137 
we read “The Christian Scientists whom I have treated because of their 
‘religious conflicts’ have been suffering from struggles with . . . ‘sex.’’’ 
On p. 128 it is asserted that “Psychological homosexuality” ‘‘can usually 
be treated successfully.” 

But the important objective of the book is to point out a vast social 
problem. Mrs. Steiner urges the need for a nation-wide mental hygiene 
program which will include radio, extensive use of group (therapeutic) 
discussions, travelling clinics for small communities, and above all 
licensed counseling, personal and vocational, on a profession basis. 
Mrs. Steiner’s zeal leads her to proclaim that a “plan for education and 
professional psychological treatment is our next must in governmental 
interest.’’ (p. 253) 

Government, one may suspect, usually helps professions that help 
themselves, by maintaining high standards of training, service, and 
publicity. 


Richard M. Elliott 
University of Minnesota 


Gardner, Burleigh}B. Human relations in industry. Chicago: Richard 
D. Irwin, Inc., 1945, Pp. xi + 307. $3.75. 


The author, who was for five years in charge of employee relations at 
Western Electric’s Hawthorne Plant, states in his preface that his purpose 
is ‘‘a systematic presentation of the human structure of industry.’”’ The 
book fulfills this purpose. It is a good sociological description of the 
industrial organization. The structure is logically broken down in terms 
of line and staff, functional divisions, hierarchical levels, status, sex, age 
and class differences. These are well described and illustrated with case 
material. 

There is a chapter on personnel counselling in which the case for the 
non-directive method is summarized but in which there is not enough 
appreciation of the possible disruptive effects upon the relations between 
foreman and workers of even the best counselling program. ‘There is an 
excellently done chapter on merit and incentive wage determination (but 
somewhat surprisingly no mention of job evaluation), and one on restric- 
tion of output. A chapter entitled ‘“Problems of Cooperation” stresses 

















te 


petra 


alt tin Si nian ae 


wont 





+ gto Conanananit yen ee en 


2 EK rela tne 


— 7 
Birla AGIA 


414 Book Reviews 


the potential value of the method of consultative supervision and points 
to the need for effective communication. 

Allowing for the inevitable lack of integration which results ‘rom such 
an approach, the book is well organized, While it does not contain much 
that has not already been described by the Hawthorne research group, it 
does bring together in one volume and with some fresh illustrative ma- 
terial much that has been scattered heretofore among a number of books 
and articles. 

The reader with a psychological bias is likely to feel the lack of 
integration. The industrial organization as a living whole never emerges. 
In spite of the author’s obvious desire to present human relations in a 
dynamic fashion, one never gets beyond a static impression. The cake 
has been sliced in a variety of different ways, and one obtains a useful 
knowledge of what the pieces are like, but the final result is a neat stack 
of pieces of cake, and no more. 

This book points up sharply the lack of a coherent integrated set of 
principles of organized human effort. We have a wealth of descriptive 
literature not only of industry, but of church, school and state. Under- 
lying these different forms of organized striving there must be a few— 
probably rather simple—related principles. What are people trying to 
do through their organizations? How weil do they succeed in doing it? 
What happens when they fail? What can be done about it? Projected 
against a framework of generalized answers to these questions, some of the 
critical problems of our day might become more understandable. 

No matter how well one understands the “human structure of in- 
dustry,” the basic problems of human relations in industry remain 
baffling. A knowledge of structure is helpful, just as a knowledge of 
anatomy is helpful in medicine. Butitisnot enough. The author of the 
book might well argue that we need more and better understanding of the 
anatomy of industrial organizations. Perhaps we do. But nevertheless 
the patient is seriously ill. What do we know that will help him to get 
better? 

It requires an almost impossible leap of imagination to go from the 
kind of description which characterizes this book to the evaluation of 
policies or an understanding of the important problems of industrial 
relations. Throughout the book one gets glimmerings of light concerning 
minute and specific problems. But the big and important problems of 
organized human effort are left in impenetrable fog. 

Perhaps this is asking too much. Perhaps we must proceed slowly 
and painstakingly to study and understand the industrial organization 
piecemeal before we will be able to understand and deal with the major 
issues that confront us today. Certainly this book is a useful adjunct to 





Book Reviews A415 


the body of descriptive material about industry. Nevertheless, con- 
sidering the breadth of experience and knowledge of the author, I cannot 
but profess some measure of disappointment in it. 


Douglas McGregor 
Massachusetts Institute of Technology 


Scheinfeld, Amram. Women and men. New York: Harcourt, Brace 
and Co., 1944. Pp. xv + 453. $3.50. 


A book bringing together from many sciences the relevant material 
on sex differences has long been needed. In Women and Men, a popular 
book, Scheinfeld has made a more comprehensive survey than is available 
in any technical book. He goes to the scientific literature in fields such 
as anatomy, physiology, medicine, genetics, psychology, sociology, an- 
thropology, and vital and economic statistics for his basic material. 
Each chapter has been read and criticized by scientists qualified in the 
area covered and contains citations to the appended extensive bibliogra- 
phy. There is little dogmatism in the book. In his preface, Scheinfeld 
suggests that the reader make a clear distinction between facts and 
interpretations. The easy style of the book should not blind the reader 
to the care with which it has been done, even though the scientific reader 
may wish for more detail. Because powerful attitudes are aroused, 
whenever sex differences are discussed, some readers will be critical of 
parts of the book. But this should not blind them to the major job done. 
Although I could suggest additional material within my own area of 
interest, I cannot but admire the manner in which its literature has been 
combed and significant conclusions brought together. 

Instead of emphasizing the cross-section material which is, however, 
adequately covered, Scheinfeld gives a longitudinal developmental 
picture of the sexes in their social setting and context. It is a long step 
forward from its predecessor, Men and Women, by the late Havelock 
Ellis (1894). Any comparison of the two will reveal the tremendous 
advances in the data that have become available, the objectivity with 
which the problems can now be approached, and the growth of under- 
standing that has come in half a century of scientifie effort. 

Although the student of applied and industrial psychology will be 
interested particularly in those chapters which deal with the illnesses, 
the work, and the social relations of men and women, he will find that 
the whole book contains implications for the problem of the relation of 
the sexes in industry and society. Not only should personnel workers, 
counselors, and guidance experts read it; one can wish that employers and 
managers, politicians and officials, writers and propagandists, and 
femininists and traditionists read it. In a sense, this book marks the 








416 Book Reviews 


end of an era in which some measure of equality of opportunity for the 
sexes had to be won and the beginning of an era in which both theory and 
practice are more likely to be based realistically upon scientific data, with 
full recognition of the principle that men as men and women as women 
are personality systems affected by and affecting events and relations. 


John E. Anderson 
University of Minnesota 


Radvanyi, Laszlo. Public opinion measurement. A survey. Instituto 
Cientifico De La Opinion Publica Mexicana, 1945. Pp. 88. $1.00 
(U. 8. ey.). 


Questionnaires were mailed to people with a known interest in public 
opinion research, most of whom were social scientists and journalists. 
The twelve questions covered various aspects of the scientific status of 
public opinion polls, their role in a democratic society, problems of main- 
taining their integrity, a few technical considerations, and the inter- 
national cooperation of institutes of public opinion. Although the 
survey was international in scope, this particular report is confined to the 
answers received from respondents in the United States. 

The first section reports the percentage of respondents giving each 
answer to each question and in many cases presents the results separately 
for social scientists and journalists. The second section gives the com- 
plete answers of a selected group of respondents for each question. 

The report provides very little basis for evaluating the accuracy of 
the survey. The method of making up the mailing list is not explained. 
The number of questionnaires mailed is not given, although it is stated 
that it was a “large number.” The problem of bias from a possible 
selective error in the returns is disposed of with the statement that ‘the 
answers received were so numerous and so varied in their origin that they 
can be considered as quite representative of the opinions of the respective 
groups.” 

To the reviewer, many of the questions appear ambiguous and sub- 
ject to various interpretations. The first question, for example, asks 
whether public opinion polls can be considered as “scientific method” 
and “regarded an as important factor in sociological, political, historical, 
and other research.’”’ In the first place, to ask whether any thing in the 
social sciences can be regarded as “scientific method’’ invites ‘a variety 
of answers resulting from differences among people in their definitions of 
this term. In the second place, it is not clear to the reviewer whether 
this first question is one question or two: a method might be considered 
“scientific” and still not be an important factor in the development of 
the social sciences, and vice versa. 





Book Reviews 417 


In fairness to this survey, however, it must be admitted that the 
problems are so involved that the phrasing of questions is necessarily 
difficult and perhaps the questions are as good as could be expected under 
the circumstances. 

In spite of the limitations of this survey—or at least of the report— 
the results are of considerable value. The detailed answers of many 
authorities, given in the second part of the report, are well worth reading 
although they have been selected and are not necessarily representative. 
Furthermore, even these selected opinions demonstrate that the problems 
are important and that there is sufficient diversity of opinion to justify 
further study. 

Probably the spotlighting of these problems was the principal ob- 
jective of this survey; and, if this is so, it has accomplished its purpose. 

Alfred C. Welch 


Knox Reeves Advertising, Inc., 
Minneapolis, Minnesota 


Brandt, H. F. The psychology of seeing. New York: The Philosophical 
Library, 1945. Pp. 240. $3.75. 


Psychology of Seeing is based upon evidence from ocular photography. 
The author has organizéd materials derived from his published and un- 
published eye-movement researches performed over a period of ten years. 
Although it is held that eye movements serve as objective symptoms of 
perceptual processes, it is also contended ‘‘that inefficient central pro- 
cesses Over a period of time will result in faulty eye-movement patterns 
which will hinder efficient observation and learning.”” This emphasis 
on the importance of “central processes’’ is to be commended. 

After describing the methods employed and certain basic eye-move- 
ment tendencies, major sections are devoted to the use of eye-movement 
photography in evaluating advertising, in the study of learning, and in 
analyzing responses to art objects. The book is concluded with a dis- 
cussion of the psychological implications of ocular patterns and a state- 
ment of projected studies. 

The eye-movement apparatus, designed by the author, is ideally 
suited for the kind of studies undertaken. With few exceptions the 
investigations were well planned and the data adequately treated. 
Interpretations, however, are frequently faulty or inadequate. The 
author anticipates this criticism by the warning that the reader may at 
times sense “‘occasional sweeping statements.’’ Unfortunately there 
are many typographical errors in both the text and the bibliography. 


Miles A. Tinker 
University of Minnesota 








New Books, Monographs, and Pamphlets 


Books, monographs, and pamphlets for listing and possible review should be sent to 
Donald G. Paterson, Editor, Department of Psychology, University 
of Minnesota, Minneapolis 14, Minnesota 


Recent occupational trends in American labor—a supplement to occupational 
trends in the United States. Dewey Anderson and Percy E. Davidson. 
Stanford University: Stanford University Press, 1945. Pp. 133. 
$2.25. 

Diagnostic and remedial teaching in secondary schools. Glenn Myers 
Blair. New York: The Macmillan Co., 1946. Pp. 422. $3.25. 
Moving ahead on your job. Richard P. Calhoon. New York: McGraw- 

Hill Book Co., Inc., 1946. Pp. 295. $2.75. 

It’s how you take it. G. Colket Caner. New York: Coward-McCann, 
Inc., 1946. Pp. 152. $2.00. 

The application of measurement to health and physical education. H. 
Harrison Clarke. New York: Prentice-Hall, Inc., 1945. Pp. xvi 
+ 415. 

How heredity builds our lives: an introduction to human genetics and eugenics. 
Robert Cook and Barbara 8. Burks. Washington, D. C.: American 
Genetic Association, 1946. Pp. 64. $.75. 

The executive in action. M. E. Dimock. New York and London: 
Harper & Brothers, 1945. Pp. ix + 276. $3.00. 

Guidance practices at work. Clifford E. Erickson. New York: McGraw- 
Hill Book Co., Inc., 1946. Pp. 339. $3.25. 

Occupations: a selected list of pamphlets. Gertrude Forrester. New 
York: The H. W. Wilson Co., 1946. Pp. 240.. $2.25. 

Trends in employment and earnings for nineteen graduating classes of a 
teachers college. John S. French. New York: Teachers College, 
Columbia University, 1945. Pp. vi + 103. $1.85. 

Vocational aptitude tests for the blind. Samuel P. Hayes. Watertown: 
Perkins Publications, Perkins Institution and Massachusetts School 
for the Blind, 1946. Pp. 32. $.25. 

Human leadership in industry: challenge of tomorrow. Sam A. Lewisohn. 
New York and London: Harper & Brothers, 1945. Pp. ix + 112. 
$2.00. 

Psychology in industry. Norman R. F. Maier. Boston: Houghton 
Mifflin Co., 1946. Pp. 463. $3.00. 


418 





New Books, M onographs, and Pamphlets 419 


Group psychotherapy: A symposium. J.L. Moreno. New York: Beacon 
House, 1945. Pp. 305. $5.00. 

Signs, language and behavior. Charles Morris. New York: Prentice- 
Hall, Inc., 1946. Pp. 365. $5.00. 

Psychology. Normal L. Munn. Boston: Houghton Mifflin Co., 1946. 
Pp. 497., $3.25. 

The adolescent in social groups. Frances Burks Newman. Stanford 
University: Stanford University Press, 1946. Pp. 94. $1.25 p. 
$2.00 cl. 

Youth, marriage and parenthood. Lemo D. Rockwood and Mary Ford. 
New York: John Wiley and Sons, Inc., 1945. Pp. 279. $3.00. 
Evaluation of group guidance work in secondary schools. Georgia May 
Sachs. Los Angeles: The University of Southern California Press, 

1946. Pp. 120. $2.50. 

Marriage and the family. Edgar Schmiedeler. New York: McGraw- 
Hill Book Co., Inc., 1946. Pp. 285. $1.80. 

Collective bargaining. Leonard J. Smith. New York: Prentice-Hall, 
Inc., 1946. Pp. 416. $3.75. 

The dynamics of human adjustment. Percival M. Symonds. New York: 
D. Appleton-Century Company, 1946. Pp. 674. $5.00. 

The psychology of normal people. Revised. Joseph Tiffin, Frederic B. 
Knight, and Easton Jackson Asher. Chicago: D. C. Heath and Co., 
1946. Pp. 550. $3.00. 

Hitler’s professors. The part of ssialarehie in Germany’s crimes against 
the Jewish people. Max Weinreich. New York: Yiddish Scientific 

’ Institute—Yivo, 1946. Pp. 291. $3.00. 

Controlled eye movements versus practice exercises in reading. Frederick 
Lowell Westover. New York: Bureau of Publications, Teachers 
College, Columbia University, 1946. Pp. 99. $1.95. 

Proceedings of the third annual visual education institute. W. A. Wittich. 
Madison: University of Wisconsin Summer Session, 1945. Pp. 114. 








lei aoe 











